pythonhashdeterministicdill

How to make dill deterministic?


We intend to use dill to generate a hash of a function. Our previous approach was using bytecode, but it is slower and it is an extra unnecessary step if we decide to unpickle the function in the future. The output of successive calls is as follows:

import dill as d
from hashlib import md5
md5(d.dumps(lambda x: {"y": x+2})).hexdigest()
# output: 'f063cdd725f0e6f5a1d211925a1024b1'

import dill as d
from hashlib import md5
md5(d.dumps(lambda x: {"y": x+2})).hexdigest()
# output: 'ea85fa41e85f0c78c54bbe0e00e55798'

Solution

  • You can't. The dill result of a function includes the id of the function. If you define the function explicitly:

    def fn(x):
        return {"y": x+2}
    

    then you get the same dill result every time, UNTIL you add another function to the file. That causes this function's dill result to change.