A follow up from this question; Best way to save a dict of awkward1 arrays?
To save multiple columns of nested awkward1 arrays (with varying length);
import awkward1 as ak
dog = ak.from_iter([[1, 2], [5]])
cat = ak.from_iter([[4]])
pets = ak.zip({"dog": dog[np.newaxis], "cat": cat[np.newaxis]}, depth_limit=1)
ak.to_parquet(pets, "pets.parquet")
Unfortunately, this doesn't seem to work for flat lists;
import awkward1 as ak
dog = ak.from_iter([1, 2, 5])
cat = ak.from_iter([4])
pets = ak.zip({"dog": dog[np.newaxis], "cat": cat[np.newaxis]}, depth_limit=1)
ak.to_parquet(pets, "pets.parquet")
creates the error;
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-31-7f3a7fefb261> in <module>
3 cat = ak.from_iter([3])
4 pets = ak.zip({"dog": dog[np.newaxis], "cat": cat[np.newaxis]}, depth_limit=1)
----> 5 ak.to_parquet(pets, "pets.parquet")
~/Programs/anaconda3/envs/tree/lib/python3.7/site-packages/awkward/operations/convert.py in to_parquet(array, where, explode_records, list_to32, string_to32, bytestring_to32, **options)
2983 layout = to_layout(array, allow_record=False, allow_other=False)
2984 iterator = batch_iterator(layout)
-> 2985 first = next(iterator)
2986
2987 if "schema" not in options:
~/Programs/anaconda3/envs/tree/lib/python3.7/site-packages/awkward/operations/convert.py in batch_iterator(layout)
2978 )
2979 yield pyarrow.RecordBatch.from_arrays(
-> 2980 pa_arrays, schema=pyarrow.schema(pa_fields)
2981 )
2982
~/Programs/anaconda3/envs/tree/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.RecordBatch.from_arrays()
TypeError: object of type 'pyarrow.lib.Tensor' has no len()
What is the reason for encountering this error?
What you found is a bug, and now it is fixed: https://github.com/scikit-hep/awkward-1.0/pull/799
What's happening here is that pyarrow can't write pyarrow.lib.Tensor
(regular-length lists, such as the one you created with np.newaxis
) to Parquet files. Parquet files don't have a concept of "regular-length list," so that makes sense. But rather than converting it, pyarrow hits an unhandled case, in which it fails to find the length of that pyarrow.lib.Tensor
. (It's a little odd that pyarrow.lib.Tensor
doesn't have a __len__
method, but that's another thing.)
Anyway, with version 1.2.0 of Awkward Array, we'll simply convert regular-length lists into (in principle) variable-length lists when writing to Parquet, since the format doesn't have that type. According to the schedule, version 1.2.0 will be released tomorrow. (This bug-fix is likely the last prerelease.)