I would like to ask how to use dd.map_partitions for h3.string_to_h3 function. my dataframe looks like this
h3 | lat | lon | x | y | elevation |
---|---|---|---|---|---|
2 | 8ca80c8e91015ff | -23.068134 | -52.042272 | 393235.906794 | 7.448557e+06 |
3 | 8ca80c8ecadd1ff | -23.095896 | -52.031107 | 394401.401086 | 7.445492e+06 |
4 | 8ca80cbb455b1ff | -23.052007 | -52.055948 | 391822.030340 | 7.450333e+06 |
5 | 8ca80cbb6a06dff | -23.045227 | -52.049591 | 392468.007662 | 7.451088e+06 |
6 | 8ca80c85876e9ff | -23.077720 | -52.085169 | 388849.315388 | 7.447464e+06 |
If this is pandas, I can simply using apply function to get hexagon index, df['h3'].apply(h3.string_to_h3)
. But how if I have a large dataset and would like to use dd.map_partitions?
I have tried df['h3'].apply(h3.string_to_h3)
, df['h3'].map_partitions(h3.string_to_h3, meta={'hexagons':'int64'})
, and df['h3'].map_partitions(h3.string_to_h3, axis=1, meta={'hexagons':'int64'})
. None of them are working.
Could someone here told me how to resolve this issue?
Thanks
I think map_partitions
does what it says on the tin - that is, it applies a mapping function that accepts a partition dataframe as input. You can then manipulate the partition itself inside that function.
I haven't tested the code below, but I believe this should work:
df['h3'] = df.map_partitions(
lambda partition: partition['h3'].apply(h3.string_to_h3),
meta=('h3', np.uint64),
)