为什么字典比列表快那么多?

我正在测试从 Dictionary VS list 获取数据的速度。
我用这段代码来测试:

    internal class Program
{
private static void Main(string[] args)
{
var stopwatch = new Stopwatch();
List<Grade> grades = Grade.GetData().ToList();
List<Student> students = Student.GetStudents().ToList();


stopwatch.Start();
foreach (Student student in students)
{
student.Grade = grades.Single(x => x.StudentId == student.Id).Value;
}
stopwatch.Stop();
Console.WriteLine("Using list {0}", stopwatch.Elapsed);
stopwatch.Reset();
students = Student.GetStudents().ToList();
stopwatch.Start();
Dictionary<Guid, string> dic = Grade.GetData().ToDictionary(x => x.StudentId, x => x.Value);
foreach (Student student in students)
{
student.Grade = dic[student.Id];
}
stopwatch.Stop();
Console.WriteLine("Using dictionary {0}", stopwatch.Elapsed);
Console.ReadKey();
}
}


public class GuidHelper
{
public static List<Guid> ListOfIds=new List<Guid>();


static GuidHelper()
{
for (int i = 0; i < 10000; i++)
{
ListOfIds.Add(Guid.NewGuid());
}
}
}




public class Grade
{
public Guid StudentId { get; set; }
public string Value { get; set; }


public static IEnumerable<Grade> GetData()
{
for (int i = 0; i < 10000; i++)
{
yield return new Grade
{
StudentId = GuidHelper.ListOfIds[i], Value = "Value " + i
};
}
}
}


public class Student
{
public Guid Id { get; set; }
public string Name { get; set; }
public string Grade { get; set; }


public static IEnumerable<Student> GetStudents()
{
for (int i = 0; i < 10000; i++)
{
yield return new Student
{
Id = GuidHelper.ListOfIds[i],
Name = "Name " + i
};
}
}
}

在内存中有一个学生和分数的列表,他们有共同的 StudentId。
首先,我尝试用 LINQ 在一个列表上查找一个学生的成绩,这个列表在我的机器上需要花费近7秒钟的时间。然后,我用另一种方法把 List 转换成字典,然后用不到一秒钟的键从字典中查找学生的成绩。 enter image description here

99730 次浏览

When using Dictionary you are using a key to retrieve your information, which enables it to find it more efficiently, with List you are using Single Linq expression, which since it is a list, has no other option other than to look in entire list for wanted the item.

Dictionary uses hashing to search for the data. Each item in the dictionary is stored in buckets of items that contain the same hash. It's a lot quicker.

Try sorting your list, it will be a a bit quicker then.

When you do this:

student.Grade = grades.Single(x => x.StudentId == student.Id).Value;

As written it has to enumerate the entire List until it finds the entry in the List that has the correct studentId (does entry 0 match the lambda? No... Does entry 1 match the lambda? No... etc etc). This is O(n). Since you do it once for every student, it is O(n^2).

However when you do this:

student.Grade = dic[student.Id];

If you want to find a certain element by key in a dictionary, it can instantly jump to where it is in the dictionary - this is O(1). O(n) for doing it for every student. (If you want to know how this is done - Dictionary runs a mathematical operation on the key, which turns it into a value that is a place inside the dictionary, which is the same place it put it when it was inserted)

So, dictionary is faster because you used a better algorithm.

Dictionary is based on a hash table which is a rather efficient algorithm to look up things. In a list you have to go element by element in order to find something.

It's all a matter of data organization...

The reason is because a dictionary is a lookup, while a list is an iteration.

Dictionary uses a hash lookup, while your list requires walking through the list until it finds the result from beginning to the result each time.

to put it another way. The list will be faster than the dictionary on the first item, because there's nothing to look up. it's the first item, boom.. it's done. but the second time the list has to look through the first item, then the second item. The third time through it has to look through the first item, then the second item, then the third item.. etc..

So each iteration the lookup takes more and more time. The larger the list, the longer it takes. While the dictionary is always a more or less fixed lookup time (it also increases as the dictionary gets larger, but at a much slower pace, so by comparison it's almost fixed).

When it comes to lookup of data, a keyed collection is always faster than a non-keyed collection. This is because a non-keyed collection will have to enumerate its elements to find what you are looking for. While in a keyed collection you can just access the element directly via the key.

These are some nice articles for comparing list to dictionary.

Here. And this one.

A dictionary uses a hash table, it is a great data structure as it maps an input to a corresponding output almost instantaneously, it has a complexity of O(1) as already pointed out which means more or less immediate retrieval.

The cons of it is that for the sake of performance you need lots of space in advance (depending on the implementation be it separate chaining or linear/quadratic probing you may need at least as much as you're planning to store, probably double in the latter case) and you need a good hashing algorithm that maps uniquely your input ("John Smith") to a corresponding output such as a position in an array (hash_array[34521]).

Also listing the entries in a sorted order is a problem. If I may quote Wikipedia:

Listing all n entries in some specific order generally requires a separate sorting step, whose cost is proportional to log(n) per entry.

Have a read on linear probing and separate chaining for some gorier details :)

From MSDN - Dictionary mentions close to O(1) but I think it depends on the types involved.

The Dictionary(TKey,TValue) generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.

Note: The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.

List(TValue) does not implement a hash lookup so it is sequential and the performance is O(n). It also depends on the types involved and boxing/unboxing needs to be considered.