矢量还是地图,用哪一个?

我听到很多人说,如果容器中预期的元素数量相对较少,那么最好使用 std::vector而不是 std::map,即使您只使用容器进行查找而不进行迭代。

这背后的真正原因是什么?

显然,std::map的查找性能不可能比 std::vector差(尽管它可能在纳秒/微秒之间有所不同) ,那么它是否与内存使用有关?

在分割虚拟地址空间方面,std::vector是否比 std::map做得更好或更差?

我使用的是 Visual Studio 附带的 STL 库(也就是微软的实现)。与其他实现相比,这有什么不同吗?

111591 次浏览

Maps are usually implemented as binary search trees, and walking a binary tree always comes with a little overhead (performing comparisons, walking links, etc.) Vectors are basically just arrays. For very small amounts of data, maybe 8 or 12 elements, sometimes it's faster just to do a linear search over an array than to walk a binary search tree.

You can run some timings yourself to see where the break-even point is -- time a search over four elements, then eight, then sixteen, and so on to find the sweet spot for your particular implementation of the STL.

Maps do tend to have a bunch of small allocations all over the heap, whereas vectors are contiguous so the cache-hit rate of vectors can sometimes be a little better in cases where you're iterating over all the elements from front to back.

I presume you're comparing map<A, B> with vector<pair<A, B> >.

Firstly, finding an item in a very small vector can easily be faster than the same thing in a map, because all the memory in a vector is always contiguous (and so plays more nicely with computers' caches and such things), and the number of comparisons needed to find something in a vector might be about the same as for a map. Finding an element in a map needs fewer operations in the limit of very large containers.

The point where maps become faster than vectors depends on the implementation, on your processor, what data is in the map, and subtle things like what memory is in the processor's cache. Typically, the point where map becomes faster would be about 5-30 elements.

An alternative is to use a hash container. They are often named hash_map or unordered_map. Classes named hash_map are not part of the official standard (and there are a few variants out there); std::tr1::unordered_map is. A hash map is often faster than a normal map for lookups, regardless of how many elements are in it, but whether it is actually faster depends on what the key is, how it is hashed, what values you have to deal with, and how the key is compared in std::map. It doesn't keep things in a specific order like std::map, but you've said that you don't care about that. I'd recommend hash maps particularly if the keys are integers or pointers, because these hash very quickly.

If you're doing all your insertions at once then doing lots of lookups, you can use a vector and sort it when you're through inserting; then use lower_bound to do a quick lookup. It might be faster than using a map, even for large numbers of items.

I think you should use the container that fits the data first and foremost. std::vector is used in situations where you would use an array in C or pre-STL C++: you want a contiguous block of memory to store values with fast constant time look-up. std::map should be used to map keys to values. The primary overlap here is a vector vs a map with a size_t as the key. In that case there are two concerns: are the indexes continuous? If not, you will probably be wasting memory with a vector. Second, what look-up time do you want? A vector has constant time lookup, while std::map is usually implemented as a RB tree, which has a O(log n) look-up time, and even a hash map (such as TR1 unordered_map) usually has a worse complexity, because the index (or a hash thereof) will be mapped to a bucket that can contain multiple values.

If were aiming at a vector with pairs: you could the elements of the vector and use find to find elements. But this is a binary search, and will practically be as fast as a std::map.

Anyway, try to model the data in the obvious manner. Premature optimization often doesn't help much.

"By default, use vector when you need a container" - Bjarne Stroustrup.

Otherwise, I find this little flow chart to be of very good help (edited - probably a valid live new link):

https://ngoduyhoa.blogspot.com/2015/06/summary-of-different-containers.html

Another way to look at this, is if we're talking about small containers, then neither one is going to take very long to look up. Unless you're searching through this container on a very tight loop, the difference in time will probably be negligible.

In that case, I would look for which container more closely matches what you want to do. If you're looking for a particular value, map's built-in find() method will be a lot easier (and less complex to use) than creating a for loop and iterating over a vector.

Your own time is probably worth a lot more than a few nano-seconds here and there.

Basically, maps are used for lookup.

But, sometimes std::vector can be used instead of std::map even for look up.

If there are going to be very less elements in your key-value pairs, then you can go for an iterative search using key even in std::vector<std::pair<x,y>>.

This is because of the fact that hashing takes time, especially for hashing strings and for other operations in map like storing data in heap.

You would only see a better difference in std::map, if you have more elements in which you have to lookup and also when you want to do frequent lookup in the list of elements that you have.