Say I have:
import pyarrow as pa
arr = pa.array([1, 3, 2, 2, 1, 3])
I'd like to replace values according to {1: 'one', 2: 'two', 3: 'three'}
and to end up with:
<pyarrow.lib.LargeStringArray object at 0x7f8dd0b3c820>
[
"one",
"three",
"two",
"two",
"one",
"three"
]
I can do this by going via Polars:
In [19]: pl.from_arrow(arr).replace_strict({1: 'one', 2: 'two', 3: 'three'}, return_dtype=pl.String).to_arrow()
Out[19]:
<pyarrow.lib.LargeStringArray object at 0x7f8dd0b3c820>
[
"one",
"three",
"two",
"two",
"one",
"three"
]
Is there a way to do it with just PyArrow?
using functions in the pyarryow compute module can do what you are after
trivial example below
$ cat /tmp/tomap.py
import pyarrow as pa
maps = {1: "one", 2: "two", 5: "five", 3: "three", 4: "four" }
numbersArr = pa.array([1, 3, 2, 2, 1, 3, 4, 5, 4, 1] )
idxs = pa.compute.index_in( numbersArr, pa.array(list(maps.keys())) ) #indices into
wordsArr = pa.compute.take( pa.array(list(maps.values())), idxs ) # lookup value at...
print(wordsArr)
$ python /tmp/tomap.py
[
"one",
"three",
"two",
"two",
"one",
"three",
"four",
"five",
"four",
"one"
]
hope this is helpful check out https://arrow.apache.org/docs/python/compute.html for detail