pythonreverse-engineeringpycpyd

How hard to reverse engineer .pyd files?


After reading How do I protect Python code? , I decided to try a really simple extension module on Windows. I compiled my own extension module on Linux before, but this is the first time I compiled it on Windows. I was expecting to get a .dll file, but instead, I got a .pyd file. Docs says they are kind of same, but it must have an init[insert-module-name]() function.

Is it safe to assume, it is as hard to reverse engineer them as dll files. If not, what is their hardness to reverse engineer in a scale from .pyc file to .dll files?


Solution

  • They are, as you already found out, equivalent to DLL files with a certain structure. In principle, they are equally hard to reverse-engineer, they are machine code, need very little metadata, and the code may have been optimized beyond recognition.

    However, the required structure, and knowing that many functions will be handling PyObject *s and other well-defined CPython types, may have some effect. It won't really help with mapping the assembly code to C (if anything, it gets harder due to CPython-specific macros). Code that mostly interacts with Python types will look quite different from code manipulating C structs (and comparatively bloated). This may make it even harder to comprehend, or it may give away code which does nothing interesting and allows an reverse engineer to skip over it and get to your trade secrets earlier.

    None of these concerns apply to pieces of code which are pure C code (i.e. do not interact with Python). And you probably have a lot of those. So it shouldn't make a significant difference in the end.