pythonpython-3.xpicklememoization

Can recycling python object ids be a problem to a Pickler?


I read that pyhton will recycle IDs, meaning that a new object can end up with the ID of one that previously existed and was distroyed. I also read about pickle:

The pickle module keeps track of the objects it has already serialized, so that later references to the same object won’t be serialized again. marshal doesn’t do this.

If I hold an instance of a Pickler open for several minutes writing to a single file as information comes in, and discard it immediately after calling Pickler.dump(obj), is there a risk that a new obj will be given the id of another that's already been written to in the same file and so accidently the wrong thing is written?


Solution

  • There is no risk of this occurring, because the memo dict includes the object in question; its lifetime does not end until the pickling is complete and the memo dictionary is cleaned up.

    Specifically, in the current implementation, the memoization dictionary maps the id of the object to a two-tuple, where the first element is the index to use when the memoized value is written a second time, and the second element is the value itself. This is an intentional part of the design, per the comments within the function, which state:

    The object is stored in the Pickler memo so that transient objects are kept alive during pickling.