Thanks to David Beazley's slides on Generators I'm quite taken with using generators for data processing in order to keep memory consumption minimal. Now I'm working on my first kedro project, and my question is how I can use generators in kedro. When I have a node that yields a generator, and then run it with kedro run --node=example_node
, I get the following error:
DataSetError: Failed while saving data to data set MemoryDataSet().
can't pickle generator objects
Am I supposed to always load all my data into memory when working with kedro?
Hi @ilja to do this you may need to change the type of assignment
operation that MemoryDataSet
applies.
In your catalog, declare your datasets explicitly, change the copy_mode
to one of copy
or assign
. I think assign
may be your best bet here...
https://kedro.readthedocs.io/en/stable/kedro.io.MemoryDataSet.html
I hope this works, but am not 100% sure.