pythonbinary-reproducibility

Reproducible builds in python


I need to ship a compiled version of a python script and be able to prove (using a hash) that the compiled file is indeed the same as the original one.

What we use so far is a simple:

find . -name "*.py" -print0 | xargs -0 python2 -m py_compile

The issue is that this is not reproducible (not sure what are the fluctuating factors but 2 executions will not give us the same .pyc for the same python file) and forces us to always ship the same compiled version instead of being able to just give the build script to anyone to produce a new compiled version.

Is there a way to achieve that?

Thanks


Solution

  • Compiled Python files include a four-byte magic number and the four-byte datetime of compilation. This probably accounts for the discrepancies you are seeing.

    If you omit bytes 5-8 from the checksumming process then you should see constant checksums for a given version of Python.

    The format of the .pyc file is given in this blog post by Ned Batchelder.