pythongeneratorpicklepython-stackless

Why can't generators be pickled?


Python's pickle (I'm talking standard Python 2.5/2.6/2.7 here) cannot pickle locks, file objects etc.

It also cannot pickle generators and lambda expressions (or any other anonymous code), because the pickle really only stores name references.

In case of locks and OS-dependent features, the reason why you cannot pickle them is obvious and makes sense.

But why can't you pickle generators?


Note: just for clarity -- I'm interested in the fundamental reason (or assumptions and choices that went into that design decision) why, not in "because it gives you a Pickle error".

I realize the question's a bit wide-aimed, so here's a rule of thumb of whether your answered it: "If these assumptions were raised, or the type of allowed generator somehow more restricted, would pickling generators work again?"


Solution

  • There is lots of information about this available. For the "official word" on the issue, read the (closed) Python bugtracker issue.

    The core reasoning, by one of the people who made the decision, is detailed on this blog:

    Since a generator is essentially a souped-up function, we would need to save its bytecode, which is not guarantee to be backward-compatible between Python’s versions, and its frame, which holds the state of the generator such as local variables, closures and the instruction pointer. And this latter is rather cumbersome to accomplish, since it basically requires to make the whole interpreter picklable. So, any support for pickling generators would require a large number of changes to CPython’s core.

    Now if an object unsupported by pickle (e.g., a file handle, a socket, a database connection, etc) occurs in the local variables of a generator, then that generator could not be pickled automatically, regardless of any pickle support for generators we might implement. So in that case, you would still need to provide custom __getstate__ and __setstate__ methods. This problem renders any pickling support for generators rather limited.

    And two suggested workarounds are mentioned:

    Anyway, if you need for a such feature, then look into Stackless Python which does all the above. And since Stackless’s interpreter is picklable, you also get process migration for free. This means you can interrupt a tasklet (the name for Stackless’s green threads), pickle it, send the pickle to a another machine, unpickle it, resume the tasklet, and voilà you’ve just migrated a process. This is freaking cool feature!

    But in my humble opinion, the best solution to this problem to the rewrite the generators as simple iterators (i.e., one with a __next__ method). Iterators are easy and efficient space-wise to pickle because their state is explicit. You would still need to handle objects representing some external state explicitly however; you cannot get around this.