pythondebuggingmemory

Debugging seeming memory issues in Python


I have a Python script which deals with a fair amount of data using a fair amount of recursion, though not so much that it triggers a MemoryError or RecursionError. Whether this script runs to completion depends on how its run, as well as some seeming random chance.

The exceptions referenced above are all of a similar variety: a TypeError or AttributeError a dozen or so calls down a recursive chain that doesn't actually happen. For example,

TypeError: unsupported operand type(s) for |: 'function' and 'set'

where the left operand is never a function, and in particular not a function in the offending call, as confirmed in a debugger.

All of this madness points to nasty memory errors... somewhere (0xC0000005 for example is a Windows access violation). Python is not a language that deals with cryptic memory issues often, unless there's an obvious low-level library mucking things up (this script is pure Python). Debugging is near impossible, as the debugger catches all of the mentioned exceptions but offers no explanation as to how they creeped up. And there's no runaway memory leakage; the script is running (with trace) right now with a stable 1.6 GB footprint.

I found other answers indicating that PyCharm could be a culprit, and indeed its runner is at least partially responsible for early termination, but even the CLI is yielding odd results (why would it ever stop silently?). And trace is of no help, since with it the script magically succeeds, as if a watchful eye scares it into submission.

So, all of this to say that I'm not necessarily looking for assistance with this particular script; there's no need for anybody else to go digging through this mess. Instead, I'm looking for advice on debugging such memory-related errors in Python, and, if able, a description of what potential causes to look for.

Searching for existing answers about this stuff has proved extremely difficult; SO questions concerning 0xC0000005, for example, almost always have a library like PyTorch as the suspect. I've attempted reworking my script, and I think I've made it incrementally more efficient, but to no avail. This is such a particular and nasty problem, but I'm certain I'm not the only one to have faced it. Any information or places to find it would be greatly appreciated.


Solution

  • Much to the benefit of my sanity, the culprit was a faulty Intel chip (you know the ones). I seem to have even accelerated its degradation by doing this project, as most reports about it didn't really circulate until later in the year.