pythonpyarrow

Replace all values in array according to mapping


Say I have:

import pyarrow as pa

arr = pa.array([1, 3, 2, 2, 1, 3])

I'd like to replace values according to {1: 'one', 2: 'two', 3: 'three'} and to end up with:

<pyarrow.lib.LargeStringArray object at 0x7f8dd0b3c820>
[
  "one",
  "three",
  "two",
  "two",
  "one",
  "three"
]

I can do this by going via Polars:

In [19]: pl.from_arrow(arr).replace_strict({1: 'one', 2: 'two', 3: 'three'}, return_dtype=pl.String).to_arrow()
Out[19]:
<pyarrow.lib.LargeStringArray object at 0x7f8dd0b3c820>
[
  "one",
  "three",
  "two",
  "two",
  "one",
  "three"
]

Is there a way to do it with just PyArrow?


Solution

  • using functions in the pyarryow compute module can do what you are after

    trivial example below

    $ cat /tmp/tomap.py
    import pyarrow as pa
    
    maps = {1: "one", 2: "two", 5: "five", 3: "three", 4: "four" }
    
    numbersArr = pa.array([1, 3, 2, 2, 1, 3, 4, 5, 4, 1] )
    
    idxs = pa.compute.index_in( numbersArr, pa.array(list(maps.keys())) ) #indices into
    
    wordsArr = pa.compute.take( pa.array(list(maps.values())), idxs ) # lookup value at...
    
    print(wordsArr)
    
    $ python /tmp/tomap.py
    [
      "one",
      "three",
      "two",
      "two",
      "one",
      "three",
      "four",
      "five",
      "four",
      "one"
    ]
    

    hope this is helpful check out https://arrow.apache.org/docs/python/compute.html for detail