在c++中使用HashMap的最佳方式是什么?

小开

最佳答案

标准库包括有序和无序映射容器(std::map和std::unordered_map)。在有序映射中，元素按键排序，insert和access在O (log n)中。通常标准库内部使用红黑树来进行有序映射。但这只是一个实现细节。在无序映射中，插入和访问是O(1)。它只是哈希表的另一个名称。

(ordered) std::map的例子:

#include <map>
#include <iostream>
#include <cassert>


int main(int argc, char **argv)
{
std::map<std::string, int> m;
m["hello"] = 23;
// check if key is present
if (m.find("world") != m.end())
std::cout << "map contains key world!\n";
// retrieve
std::cout << m["hello"] << '\n';
std::map<std::string, int>::iterator i = m.find("hello");
assert(i != m.end());
std::cout << "Key: " << i->first << " Value: " << i->second << '\n';
return 0;
}

输出:

23
Key: hello Value: 23

如果你需要在容器中排序，并且O(log n)运行时没问题，那么就使用std::map。

否则，如果你真的需要一个哈希表(O(1)插入/访问)，检查std::unordered_map，它具有类似std::map的API(例如，在上面的例子中，你只需要搜索并将map替换为unordered_map)。

unordered_map容器是在11标准c++修订版中引入的。因此，取决于你的编译器，你必须启用c++ 11特性(例如，当使用GCC 4.8时，你必须将-std=c++11添加到CXXFLAGS)。

甚至在c++ 11发布之前，GCC就在命名空间std::tr1中支持unordered_map -。因此，对于旧的GCC编译器，你可以尝试这样使用它:

#include <tr1/unordered_map>


std::tr1::unordered_map<std::string, int> m;

它也是boost的一部分，也就是说，你可以使用相应的boost-header来获得更好的可移植性。

小开

hash_map是一个较老的、非标准化的版本，出于标准化的目的被称为unordered_map(最初在TR1中，从c++ 11开始包含在标准中)。正如它的名字所暗示的，它与std::map的主要不同之处在于它是无序的——例如，如果你从begin()遍历一个映射到end()，你会得到按键__abc8排序的项，但如果你从begin()遍历一个unordered_map到end()，你会得到或多或少任意顺序的项。

unordered_map通常被认为具有恒定的复杂度。也就是说，无论表中有多少项，插入、查找等基本上都需要固定的时间。std::map的复杂度是存储的项数的对数——这意味着插入或检索项的时间会增加，但相当慢慢地，因为映射变大了。例如，如果查找100万个项中的一个需要1微秒，那么查找200万个项中的一个需要大约2微秒，查找400万个项中的一个需要3微秒，查找800万个项中的一个需要4微秒，以此类推。

从实际的角度来看，这并不是故事的全部。本质上，简单哈希表的大小是固定的。使其适应通用容器的可变大小需求并非易事。因此，(可能)增加表的操作(例如，插入)可能相对较慢(也就是说，大多数操作相当快，但周期性地会慢得多)。不能改变表大小的查找通常要快得多。因此，与插入数量相比，大多数基于哈希的表在执行大量查找时往往处于最佳状态。对于插入大量数据，然后遍历表一次以检索结果的情况(例如，计算文件中唯一单词的数量)，std::map可能会同样快，甚至可能更快(但是，同样，计算复杂性是不同的，所以这也取决于文件中唯一单词的数量)。

< p > <一口> 其中顺序在创建映射时由模板的第三个参数定义，默认为std::less<T>。 < /一口> < / p >

小开

下面是一个更完整、更灵活的示例，它没有省略生成编译错误所需的include:

#include <iostream>
#include <unordered_map>


class Hashtable {
std::unordered_map<const void *, const void *> htmap;


public:
void put(const void *key, const void *value) {
htmap[key] = value;
}


const void *get(const void *key) {
return htmap[key];
}


};


int main() {
Hashtable ht;
ht.put("Bob", "Dylan");
int one = 1;
ht.put("one", &one);
std::cout << (char *)ht.get("Bob") << "; " << *(int *)ht.get("one");
}

对于键仍然不是特别有用，除非它们被预定义为指针，因为匹配的值是不行的!(但是，由于我通常使用字符串作为键，在键的声明中用“string”代替“const void *”应该可以解决这个问题。)

小开

std::unordered_map在GCC stdlibc++ 6.4中使用哈希映射的证据

这是在:https://stackoverflow.com/a/3578247/895245中提到的，但在以下答案中:在c++中std::map内部是什么数据结构?我已经为GCC stdlibc++ 6.4实现提供了进一步的证据:

GDB步骤调试成类
性能特征分析

以下是回答中描述的性能特征图的预览:

如何使用自定义类和哈希函数unordered_map

这个答案是:c++ unordered_map使用自定义类类型作为键

摘录:平等:

struct Key
{
std::string first;
std::string second;
int         third;


bool operator==(const Key &other) const
{ return (first == other.first
&& second == other.second
&& third == other.third);
}
};

哈希函数:

namespace std {


template <>
struct hash<Key>
{
std::size_t operator()(const Key& k) const
{
using std::size_t;
using std::hash;
using std::string;


// Compute individual hash values for first,
// second and third and combine them using XOR
// and bit shifting:


return ((hash<string>()(k.first)
^ (hash<string>()(k.second) << 1)) >> 1)
^ (hash<int>()(k.third) << 1);
}
};


}

小开

对于那些试图在使用标准模板的同时找出如何散列我们自己的类的人来说，有一个简单的解决方案:

在你的类中，你需要定义一个相等操作符重载==。如果你不知道如何做到这一点，GeeksforGeeks有一个很好的教程https://www.geeksforgeeks.org/operator-overloading-c/

在标准命名空间下，声明一个名为hash的模板结构，类型为类名(见下文)。我发现了一篇很棒的博客文章，它也展示了一个使用异或和位移位计算哈希的例子，但这超出了这个问题的范围，但它也包括了如何使用哈希函数完成的详细说明https://prateekvjoshi.com/2014/06/05/using-hash-function-in-c-for-user-defined-classes/

namespace std {


template<>
struct hash<my_type> {
size_t operator()(const my_type& k) {
// Do your hash function here
...
}
};


}

因此，为了使用新的哈希函数实现哈希表，你只需要创建一个std::map或std::unordered_map，就像你通常做的那样，并使用my_type作为键，标准库将自动使用你之前(在步骤2中)定义的哈希函数来哈希你的键。

#include <unordered_map>


int main() {
std::unordered_map<my_type, other_type> my_map;
}