Set 和 std: : Advanty_ queue 之间的区别

由于 std::priority_queuestd::set(以及 std::multiset)都是数据容器,它们存储元素并允许您以有序的方式访问它们,并且具有相同的插入复杂度 O(log n),因此使用其中一个的优点是什么(或者,什么样的情况需要使用其中一个?)?

虽然我知道底层结构是不同的,但我对它们实现上的差异并不感兴趣,而是对它们的 表演合适在各种用途上的比较。

注意: 我知道一个集合中的无重复项。这就是为什么我还提到了 std::multiset,因为它具有与 std::set完全相同的行为,但是可以在允许将存储的数据作为相同的元素进行比较的情况下使用。所以,请不要评论单键/多键的问题。

33940 次浏览

set/multiset are generally backed by a binary tree. http://en.wikipedia.org/wiki/Binary_tree

priority_queue is generally backed by a heap. http://en.wikipedia.org/wiki/Heap_(data_structure)

So the question is really when should you use a binary tree instead of a heap?

Both structures are laid out in a tree, however the rules about the relationship between anscestors are different.

We will call the positions P for parent, L for left child, and R for right child.

In a binary tree L < P < R.

In a heap P < L and P < R

So binary trees sort "sideways" and heaps sort "upwards".

So if we look at this as a triangle than in the binary tree L,P,R are completely sorted, whereas in the heap the relationship between L and R is unknown (only their relationship to P).

This has the following effects:

  • If you have an unsorted array and want to turn it into a binary tree it takes O(nlogn) time. If you want to turn it into a heap it only takes O(n) time, (as it just compares to find the extreme element)

  • Heaps are more efficient if you only need the extreme element (lowest or highest by some comparison function). Heaps only do the comparisons (lazily) necessary to determine the extreme element.

  • Binary trees perform the comparisons necessary to order the entire collection, and keep the entire collection sorted all-the-time.

  • Heaps have constant-time lookup (peek) of lowest element, binary trees have logarithmic time lookup of lowest element.

A priority queue only gives you access to one element in sorted order -- i.e., you can get the highest priority item, and when you remove that, you can get the next highest priority, and so on. A priority queue also allows duplicate elements, so it's more like a multiset than a set. [Edit: As @Tadeusz Kopec pointed out, building a heap is also linear on the number of items in the heap, where building a set is O(N log N) unless it's being built from a sequence that's already ordered (in which case it is also linear).]

A set allows you full access in sorted order, so you can, for example, find two elements somewhere in the middle of the set, then traverse in order from one to the other.

std::priority_queue allows to do the following:

  1. Insert an element O(log n)
  2. Get the smallest element O(1)
  3. Erase the smallest element O(log n)

while std::set has more possibilities:

  1. Insert any element O(log n) and the constant is greater than in std::priority_queue
  2. Find any element O(log n)
  3. Find an element, >= than the one your are looking for O(log n) (lower_bound)
  4. Erase any element O(log n)
  5. Erase any element by its iterator O(1)
  6. Move to previous/next element in sorted order O(1)
  7. Get the smallest element O(1)
  8. Get the largest element O(1)

Since both std::priority_queue and std::set (and std::multiset) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n), what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)?

Even though insert and std::priority_queue0 operations for both containers have the same complexity std::priority_queue1, these operations for std::set are slower than for std::priority_queue. That's because std::set makes many memory allocations. Every element of std::set is stored at its own allocation. std::priority_queue (with underlying std::vector container by default) uses single allocation to store all elements. On other hand std::priority_queue uses many swap operations on its elements whereas std::set uses just pointers swapping. So if swapping is very slow operation for element type, using std::set may be more efficient. Moreover element may be non-swappable at all.

Memory overhead for std::set is much bigger also because it has to store many pointers between its nodes.