We're doing some code cleanup. The cleanup is only about formatting (if an issue, then let's even assume, that line numbers don't change, though ideally I'd like to ignore also line number changes)
In order to be sure, that there is no accidental code change I'd like to find a simple / fast way to compare the two source codes.
So let's assume, that I have file1.py
and file2.py
what is working is to use
py_compile.compile(filename)
to create .pyc files and then use
uncompyle6 pycfile
, then strip off comments and compare the results,
But this is overkill and very slow.
Another approach I imagined is to copy
file1.py
for example to file.py
,
use py_compile.compile("file.py")
and save the .pyc file
then copy file2.py
for example to file.py
and use
use py_compile.compile("file.py")
and save the .pyc file
and finally compare both generated .pyc files
Would this work reliably with all (current) versions >= python 3.6
If I remember at least for python2 the pyc files could contain time stamps or absolute paths, that could make the comparison fail. (at least if the generation of the pyc file was run on two different machines)
Is there a clean way to compare the byte code of py2 files?
As bonus feature (if possible) I'd like to create a hash for each byte code, that I could store for future reference.
You might try using Python's internal compile
function, which can compile from string (read in from a file in your case). For example, compiling and comparing the resulting code objects from two equivalent programs and one almost equivalent program and then just for demo purposes (something you would not want to do) executing a couple of the code objects:
import hashlib
import marshal
def compute_hash(code):
code_bytes = marshal.dumps(code)
code_hash = hashlib.sha1(code_bytes).hexdigest()
return code_hash
source1 = """x = 3
y = 4
z = x * y
print(z)
"""
source2 = "x=3;y=4;z=x*y;print(z)"
source3 = "a=3;y=4;z=a*y;print(z)"
obj1 = compile(source=source1, filename='<string>', mode='exec', dont_inherit=1)
obj2 = compile(source=source2, filename='<string>', mode='exec', dont_inherit=1)
obj3 = compile(source=source3, filename='<string>', mode='exec', dont_inherit=1)
print(obj1 == obj2)
print(obj1 == obj3)
exec(obj1)
exec(obj3)
print(compute_hash(obj1))
Prints:
True
False
12
12
48632a1b64357e9d09d19e765d3dc6863ee67ab9
This will save you from having to copying py files, creating pyc files, comparing pyc files, etc.
Note:
The compute_hash
function is if you need a hash function that is repeatable, i.e. returns the same value repeatedly for the same code object when computed in successive program runs.