In Python, I know that the value __hash__
returns for a given object is supposed to be the same for the lifetime of that object. But, out of curiosity, what happens if it isn't? What sort of havoc would this cause?
class BadIdea(object):
def __hash__(self):
return random.randint(0, 10000)
I know __contains__
and __getitem__
would behave strangely, and dicts and sets would act odd because of that. You also might end up with "orphaned" values in the dict/set.
What else could happen? Could it crash the interpreter, or corrupt internal structures?
Your main problem would indeed be with dicts and sets. If you insert an object into a dict/set, and that object's hash changes, then when you try to retrieve that object you will end up looking in a different spot in the dict/set's underlying array and hence won't find the object. This is precisely why dict keys should always be immutable.
Here's a small example: let's say we put o
into a dict, and o
's initial hash is 3. We would do something like this (a slight simplification but gets the point across):
Hash table: 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | | | | o | | | | | +---+---+---+---+---+---+---+---+ ^ we put o here, since it hashed to 3
Now let's say the hash of o
changes to 6
. If we want to retrieve o
from the dict, we'll look at spot 6
, but there's nothing there! This will cause a false negative when querying the data structure. In reality, each element of the array above could have a "value" associated with it in the case of a dict, and there could be multiple elements in a single spot (e.g. a hash collision). Also, we'd generally take the hash value modulo the size of the array when deciding where to put the element. Irrespective of all these details, though, the example above still accurately conveys what could go wrong when the hash code of an object changes.
Could it crash the interpreter, or corrupt internal structures?
No, this won't happen. When we say an object's hash changing is "dangerous", we mean dangerous in the sense that it essentially defeats the purpose of hashing and makes the code difficult if not impossible to reason about. We don't mean dangerous in the sense that it could cause crashes.