哈希表运行时复杂性(插入、搜索和删除)

为什么我总是在散列表中看到这些函数的不同运行时复杂性?

在 wiki 上,搜索和删除是 O (n)(我认为散列表的意义在于进行常量查找,所以如果搜索是 O (n) ,那又有什么意义呢?)。

在一段时间以前的一些课程笔记中,我看到了依赖于某些细节的广泛复杂性,包括一个全 O (1)。如果我可以得到所有的 O (1) ,为什么还要使用其他的实现呢?

如果我在像 C + + 或 Java 这样的语言中使用标准的哈希表,我能期望时间复杂度是多少呢?

233210 次浏览

Depends on the how you implement hashing, in the worst case it can go to O(n), in best case it is 0(1) (generally you can achieve if your DS is not that big easily)

Perhaps you were looking at the space complexity? That is O(n). The other complexities are as expected on the hash table entry. The search complexity approaches O(1) as the number of buckets increases. If at the worst case you have only one bucket in the hash table, then the search complexity is O(n).

Edit in response to comment I don't think it is correct to say O(1) is the average case. It really is (as the wikipedia page says) O(1+n/k) where K is the hash table size. If K is large enough, then the result is effectively O(1). But suppose K is 10 and N is 100. In that case each bucket will have on average 10 entries, so the search time is definitely not O(1); it is a linear search through up to 10 entries.

Ideally, a hashtable is O(1). The problem is if two keys are not equal, however they result in the same hash.

For example, imagine the strings "it was the best of times it was the worst of times" and "Green Eggs and Ham" both resulted in a hash value of 123.

When the first string is inserted, it's put in bucket 123. When the second string is inserted, it would see that a value already exists for bucket 123. It would then compare the new value to the existing value, and see they are not equal. In this case, an array or linked list is created for that key. At this point, retrieving this value becomes O(n) as the hashtable needs to iterate through each value in that bucket to find the desired one.

For this reason, when using a hash table, it's important to use a key with a really good hash function that's both fast and doesn't often result in duplicate values for different objects.

Make sense?

Hash tables are O(1) average and amortized case complexity, however it suffers from O(n) worst case time complexity. [And I think this is where your confusion is]

Hash tables suffer from O(n) worst time complexity due to two reasons:

  1. If too many elements were hashed into the same key: looking inside this key may take O(n) time.
  2. Once a hash table has passed its load balance - it has to rehash [create a new bigger table, and re-insert each element to the table].

However, it is said to be O(1) average and amortized case because:

  1. It is very rare that many items will be hashed to the same key [if you chose a good hash function and you don't have too big load balance.
  2. The rehash operation, which is O(n), can at most happen after n/2 ops, which are all assumed O(1): Thus when you sum the average time per op, you get : (n*O(1) + O(n)) / n) = O(1)

Note because of the rehashing issue - a realtime applications and applications that need low latency - should not use a hash table as their data structure.

EDIT: Annother issue with hash tables: cache
Another issue where you might see a performance loss in large hash tables is due to cache performance. Hash Tables suffer from bad cache performance, and thus for large collection - the access time might take longer, since you need to reload the relevant part of the table from the memory back into the cache.

Some hash tables (cuckoo hashing) have guaranteed O(1) lookup