MurmurHash 是什么?

我一直试图对 MurmurHash的作用有一个高层次的理解。

我已经阅读了一个基本的描述,但还没有找到一个很好的解释,什么时候使用它和为什么。我知道它很快,但想知道更多。

我询问了一个相关的 有个问题,关于如何将 UUID 放入 Redis 比特集,有人建议使用 MurmurHash。这是有效的,但我想了解其中的风险/收益。

60783 次浏览

Murmur is a family of good general purpose hashing functions, suitable for non-cryptographic usage. As stated by Austin Appleby, MurmurHash provides the following benefits:

  • simple (in term of number of generated assembly instructions).
  • good distribution (passing chi-squared tests for practically all keysets & bucket sizes.
  • good avalanche behavior (max bias of 0.5%).
  • good collision resistance (passes Bob Jenkin's frog.c torture-test. No collisions possible for 4-byte keys, no small (1- to 7-bit) differentials).
  • great performance on Intel/AMD hardware, good tradeoff between hash quality and CPU consumption.

You can certainly use it to hash UUIDs (like any other advanced hashing functions: CityHash, Jenkins, Paul Hsieh's, etc ...). Now, a Redis bitset is limited to 4 GB bits (512 MB). So you need to reduce 128 bits of data (UUID) to 32 bits (hashed value). Whatever the quality of the hashing function, there will be collisions.

Using an engineered hash function like Murmur will maximize the quality of the distribution, and minimize the number of collisions, but it offers no other guarantee.

Here are some links comparing the quality of general purpose hash functions:

http://www.azillionmonkeys.com/qed/hash.html

http://www.strchr.com/hash_functions

http://blog.aggregateknowledge.com/2011/12/05/choosing-a-good-hash-function-part-1/

http://blog.aggregateknowledge.com/2011/12/29/choosing-a-good-hash-function-part-2/

http://blog.aggregateknowledge.com/2012/02/02/choosing-a-good-hash-function-part-3/

MurmurHash can be return negtive value, original value bit AND against 0x7fffffff。that is value & 0x7fffffff .When the input is positive, the original value is returned. When the input number is negative, the returned positive value is the original value bit AND against 0x7fffffff which is not its absolutely value. note:MurmurHash's return value can not be fix length.