Python bytecode (.pyc) files have a header that starts with a magic number that changes between Python versions. How can I (programmatically) find out that number for the current Python version in order to generate a valid header? I'm currently hard-coding the one for Python 3.7.1, but that means I now depend on a specific Python version.
This answer does exactly what I want using py_compile.MAGIC
, but that does not seem to exist anymore in Python 3. How can I do the equivalent in Python 3?
Here's an example of what I'm trying to do:
import dis
import marshal
PYC_HEADER = b'\x42\x0d\x0d\x0a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
def f():
print('Hello World')
with open('test.pyc', 'wb') as pyc:
pyc.write(PYC_HEADER)
marshal.dump(dis.Bytecode(f).codeobj, pyc)
This should create a file test.pyc
, which can then be run, using the same Python interpreter as the script, and should print "Hello World!". And it does, but only when using Python 3.7. I'm looking for a way that generates the header for whichever version of Python 3 is used to run the script, rather than hard-coding 3.7.
For context:
I'm compiling a simple programming language to different bytecode formats (LLVM, Java bytecode, Web Assembly and now Python bytecode) as part of a planned tutorial series on compiler construction.
I can generate the Python bytecode using the byteasm library, which gives me a function as a result. But in order to write the contents to a .pyc
file, I need a valid header. By hard-coding the header, the code will only work if the people following the tutorial are running the same version of Python 3 as I am (3.7) or they'd have to manually find out the magic number for their version.
As of Python 3.4 there is the importlib.util.MAGIC_NUMBER
in the module importlib
:
>>> import importlib
>>> importlib.util.MAGIC_NUMBER.hex()
'420d0d0a'
Another solution for Python < 3.4 or Python2 is the get_magic
method of the imp
module.
>>> import imp
>>> imp.get_magic().hex()
'420d0d0a'
Note, that while this still works in Python 3.7, it is deprecated since Python 3.4 and was removed entirely in Python 3.12.