Suppose I have a converted a simple to column dataframe to a numpy array:
gdf.head()
>>>
rid rast
0 1 01000001000761C3ECF420013F0761C3ECF42001BF7172...
1 2 01000001000761C3ECF420013F0761C3ECF42001BF64BF...
2 3 01000001000761C3ECF420013F0761C3ECF42001BF560C...
3 4 01000001000761C3ECF420013F0761C3ECF42001BF7F25...
4 5 01000001000761C3ECF420013F0761C3ECF42001BF7172...
raster_np = gdf.to_numpy()
raster_np[0][0]
>>> array([1, '01000001000761C3E.........], dtype=object))
I've been tasked with converting the numpy array to a Zarr
file format (because of the size of the rast
values and the size of the dataframe, chunking and compression might be necessary and the new .zarr files could be utilized better on an S3/cloud storage environment, I assume). I created a simple Zarr
array like so:
z_test = z.zeros(shape=(10000, 2), chunks=(10000, 2))
z_test
>>> <zarr.core.Array (10000, 2) float64>
Now, how do I get the data in raster_np
into z_test
and retain the Zarr
attributes? Simply using z_test = raster_np
obviously doesn't work. Perhaps there is something I am misunderstanding about Zarr
. Any suggestions?
z_test = zarr.array(raster_np)
See https://zarr.readthedocs.io/en/stable/api/creation.html#zarr.creation.array
and https://zarr.readthedocs.io/en/stable/api/hierarchy.html#zarr.hierarchy.Group.array