In my code, I am generating classes run-time using the make_dataclass function.
The problem is that such dataclasses cannot be serialized with pickle as shown in the code snippet here below.
It is important to know that for my application those dataclass instances must be pickable because they need to be transferred to a multiprocessing
pool of executors.
from dataclasses import dataclass, make_dataclass, asdict
import pickle
#
# Standard data class generated using the decorator approach
@dataclass
class StdDataClass:
num: int = 0
a = StdDataClass(12)
ser_a = pickle.dumps(a)
des_a = pickle.loads(ser_a)
# serialization and deserialization with pickle is working
assert a.num == des_a.num
# Run time created class using the make_dataclass approach.
# In the real case, the name, the type and the default value of each field is
# not available before the code is executed. The structure of this dataclass is
# known only at run-time.
fields = [('num', int, 0)]
B = make_dataclass('B', fields)
b = B(2)
try:
# An attempt to serialize the object is triggering an exception
# Can't pickle <class 'types.B'>: attribute lookup B on types failed
ser_b = pickle.dumps(b)
des_b = pickle.loads(ser_b)
assert b.num == des_b.num
except pickle.PickleError as e :
print(e)
Serializing a class defined with the make_dataclass method is triggering an exception. I think that this is actually to be expected, because in the documentation it is written:
The following types can be pickled:
- built-in constants (None, True, False, Ellipsis, and NotImplemented);
- integers, floating-point numbers, complex numbers;
- strings, bytes, bytearrays;
- tuples, lists, sets, and dictionaries containing only picklable objects;
- functions (built-in and user-defined) accessible from the top level of a module (using def, not lambda);
- classes accessible from the top level of a module;
- instances of such classes whose the result of calling
__getstate__()
is picklable (see section Pickling Class Instances for details).
I think that the problem is that there is no definition of a Class B in the module (bold line) and that's why it fails, but I am not sure about this.
The workaround I have found is to transform the run time created dataclass into a dictionary, serialize the dictionary and when needed deserialize the dictionary to recreate the dataclass.
# A base data class with no data members but with a class method 'constructor'
# and a convenience method to convert the class into a dictionary
@dataclass
class BaseDataClass:
@classmethod
def from_dict(cls, d: dict):
new_instance = cls()
for key in d:
setattr(new_instance, key, d[key])
return new_instance
def to_dict(self):
return asdict(self)
# Another baseclass defined with the make_dataclass approach but
# using BaseDataClass as base.
C = make_dataclass('C', fields, bases=(BaseDataClass,))
c = C(13)
# WORKAROUND
#
# Instead of serializing the class object, I am pickling the
# corresponding dictionary
ser_c = pickle.dumps(c.to_dict())
# Deserialize the dictionary and use it to recreate the dataclass
des_c = C.from_dict(pickle.loads(ser_c))
assert c.num == des_c.num
Even though the workaround is actually working, I was wondering if it is not possible to teach pickle to do the same for whatever dataclass derived from BaseDataClass.
I tried to code the __reduce__
method and both the __setstate__
and __getstate__
but with no luck.
I though to sub-class the Pickle to have a custom reducer, but this is the recommended approach if you cannot modify the class of the object to be serialized (e. g. generated by an external library) and, moreover I also do not know how to specify to the multiprocessing
module to use my pickle subclass instead of the base one.
Do you have any idea how I can solve this issue?
Update
Actually, when I run your code as posted, I do not get an error trying to pickle b
because the class named 'B' returned by your call to make_dataclass
can be accessed as attribute B
of the current module. I suspect the code you posted may be an oversimplification of your actual code that is giving you the problem. The following does actually work:
>>> from dataclasses import dataclass, make_dataclass, asdict
>>> import pickle
>>>
>>> fields = [('num', int, 0)]
>>> B = make_dataclass('B', fields)
>>> B
<class '__main__.B'>
>>> b = B(2)
>>> ser_b = pickle.dumps(b)
>>> des_b = pickle.loads(ser_b)
>>> des_b.num
2
Here are two problematic cases:
>>> from dataclasses import dataclass, make_dataclass, asdict
>>> import pickle
>>>
>>> fields = [('num', int, 0)]
>>> new_class = make_dataclass('B', fields)
>>> new_class
<class '__main__.B'>
>>> b = new_class(2)
>>> ser_b = pickle.dumps(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <class '__main__.B'>: it's not the same object as __main__.B
or:
>> from dataclasses import dataclass, make_dataclass, asdict
>>> import pickle
>>>
>>> def foo():
... fields = [('num', int, 0)]
... B = make_dataclass('B', fields)
... B
... b = B(2)
... ser_b = pickle.dumps(b)
...
>>> foo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in foo
_pickle.PicklingError: Can't pickle <class '__main__.B'>: it's not the same object as __main__.B
In neither case can your class instance be pickled because the class cannot be referenced as sys.modules['__main__'].B
You can get either case to work by creating an attribute named 'B' at global scope that references your class named 'B'. For example,
from dataclasses import make_dataclass
import pickle
def foo():
fields = [('num', int, 0)]
B = make_dataclass('B', fields)
b = B(2)
globals()['B'] = B # Make class B global
ser_b = pickle.dumps(b)
des_b = pickle.loads(ser_b)
print(des_b.num, des_b == b)
foo()
Prints:
2 True
See the globals
built-in function.
And if you had (whether at global scope or not):
new_class = make_dataclass('B', fields)
Then you would need:
globals()['B'] = new_class
Update 2
On Python 3.8 you code does fail and you need instead:
from dataclasses import dataclass, make_dataclass, asdict
import pickle
import types
fields = [('num', int, 0)]
B = make_dataclass('B', fields)
b = B(2)
types.B = B
ser_b = pickle.dumps(b)
des_b = pickle.loads(ser_b)
print(des_b.num)