可散列的,不可变的

从最近的一个 SO 问题(参见 在 python 中创建一个由列表索引的字典)中,我意识到我可能对 python 中散列对象和不可变对象的含义有一个错误的概念。

  • 散列表在实践中是什么意思?
  • 散列和不可变之间的关系是什么?
  • 是否有可变对象是散列的或不可变对象不是散列的?
33340 次浏览

Hashing is the process of converting some large amount of data into a much smaller amount (typically a single integer) in a repeatable way so that it can be looked up in a table in constant-time (O(1)), which is important for high-performance algorithms and data structures.

Immutability is the idea that an object will not change in some important way after it has been created, especially in any way that might change the hash value of that object.

The two ideas are related because objects which are used as hash keys must typically be immutable so their hash value doesn't change. If it was allowed to change then the location of that object in a data structure such as a hashtable would change and then the whole purpose of hashing for efficiency is defeated.

To really grasp the idea you should try to implement your own hashtable in a language like C/C++, or read the Java implementation of the HashMap class.

Technically, hashable means that the class defines __hash__(). According to the docs:

__hash__() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.

I think that for the Python builtin types, all hashable types are also immutable.

It would be difficult but perhaps not impossible to have a mutable object that nonetheless defined __hash__().

Hashable means that an variable's value can be represented (or, rather, encoded) by a constant -- string, number, etc. Now, something that is subject to change (mutable) cannot be represented by something that is not. Therefore, any variable that is mutable cannot be hashable and, by the same token, only immutable variables can be hashable.

Hope this helps ...

In Python they're mostly interchangeable; since the hash is supposed to represent the content, so it's just as mutable as the object, and having an object change the hash value would make it unusable as a dict key.

In other languages, the hash value is more related to the objects 'identity', and not (necessarily) to the value. Thus, for a mutable object, the pointer could be used to start the hashing. Assuming, of course, that an object doesn't move in memory (as some GC do). This is the approach used in Lua, for example. This makes a mutable object usable as a table key; but creates several (unpleasant) surprises for newbies.

In the end, having an immutable sequence type (tuples) makes it nicer for 'multi-value keys'.

From the Python Glossary:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().

Dicts and sets must use a hash for efficient lookup in a hash table; the hash values must be immutable, because changing the hash will mess up the data structures and cause the dict or set to fail. The easiest way to make the hash value immutable is to make the whole object immutable, which is why the two are often mentioned together.

While none of the built-in mutable objects are hashable, it is possible to make a mutable object with a hash value that's not mutable. It's common for only a portion of the object to represent its identity, while the rest of the object contains properties that are free to change. As long as the hash value and comparison functions are based on the identity but not the mutable properties, and the identity never changes, you've met the requirements.

just because this is the top Google hit, here's a simple way to make a mutable object hashable:

>>> class HashableList(list):
...  instancenumber = 0  # class variable
...  def __init__(self, initial = []):
...   super(HashableList, self).__init__(initial)
...   self.hashvalue = HashableList.instancenumber
...   HashableList.instancenumber += 1
...  def __hash__(self):
...   return self.hashvalue
...
>>> l = [1,2,3]
>>> m = HashableList(l)
>>> n = HashableList([1,2,3])
>>> m == n
True
>>> a={m:1, n:2}
>>> a[l] = 3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> m.hashvalue, n.hashvalue
(0, 1)

I actually found a use for something like this when creating a class to cast SQLAlchemy records into something mutable and more useful to me, while maintaining their hashability for use as dict keys.

There is an implicit even if there is no explicit relationship forced between immutable and hashable due the interplay between

  1. Hashable objects which compare equal must have the same hash value
  2. An object is hashable if it has a hash value which never changes during its lifetime.

There is no problem here unless you redefine __eq__ so the the objects class defines equivalence on value.

Once you've done that you need to find a stable hash function which always returns the same value for objects which represent the same value (eg, where __eq__) returns True, and never changes during the lifetime of an object.

It hard to see an application where this is possible, consider a possible class A which meets these requirements. Although there is the obvious degenerate case where __hash__ returns a constant.

Now:-

>>> a = A(1)
>>> b = A(1)
>>> c = A(2)
>>> a == b
True
>>> a == c
False
>>> hash(a) == hash(b)
True
>>> a.set_value(c)
>>> a == c
True
>>> assert(hash(a) == hash(c)) # Because a == c => hash(a) == hash(c)
>>> assert(hash(a) == hash(b)) # Because hash(a) and hash(b) have compared equal
before and the result must stay static over the objects lifetime.

In fact this means at creation hash(b) == hash(c), despite the fact there are never compared equal. I struggle to see anyway to usefully define __hash__() for a mutable object which defines compare by value.

Note: __lt__, __le__ , __gt__ and __ge__ comparsions aren't affected so you can still define an ordering of hashable objects, mutable or otherwise based on their value.

Immutable means that object will not change in any significant manner during its lifetime. It's a vague but common idea in programming languages.

Hashability is slightly different, and refers to comparison.

changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

All user-defined classes have __hash__ method, which by default just returns the object ID. So an object that meets the criteria for hashability is not necessarily immutable.

Objects of any new class you declare can be used as a dictionary key, unless you prevent it by, for example, throwing from __hash__

We could say that all immutable objects are hashable, because if the hash changes during the object's lifetime, then it means that the object mutated.

But not quite. Consider a tuple which has a list (mutable). Some say tuple is immutable, but at the same time it is somewhat not hashable (throws).

d = dict()
d[ (0,0) ] = 1    #perfectly fine
d[ (0,[0]) ] = 1  #throws

Hashability and immutability refer to object instancess, not type. For example, an object of type tuple can be hashable or not.

  • Are there mutable objects that are hashable or immutable objects that are not hashable?

In Python, tuple is immutable, but it is hashable only if all its elements are hashable.

>>> tt = (1, 2, (30, 40))
>>> hash(tt)
8027212646858338501
>>> tl = (1, 2, [30, 40])
>>> hash(tl)
TypeError: unhashable type: 'list'

Hashable Types

  • The atomic immutable types are all hashable, such as str, bytes, numeric types
  • A frozen set is always hashable(its elements must be hashable by definition)
  • A tuple is hashable only if all its elements are hashable
  • User-defined types are hashable by default because their hash value is their id()