pythonpyarrow

How to zip together two PyArrow arrays?


In Polars, I can use zip_width in order to take values from s1 or s2 according to a mask:

In [1]: import polars as pl

In [2]: import pyarrow as pa

In [3]: import pyarrow as pc

In [4]: s1 = pl.Series([1,2,3])

In [5]: mask = pl.Series([True, False, False])

In [6]: s2 = pl.Series([4, 5, 6])

In [7]: s1.zip_with(mask, s2)
Out[7]:
shape: (3,)
Series: '' [i64]
[
        1
        5
        6
]

How can I do this with PyArrow? I've tried pyarrow.compute.replace_with_mask but that works differently:

In [10]: import pyarrow.compute as pc

In [11]: import pyarrow as pa

In [12]: a1 = pa.array([1,2,3])

In [13]: mask = pa.array([True, False, False])

In [14]: a2 = pa.array([4,5,6])

In [15]: pc.replace_with_mask(a1, pc.invert(mask), a2)
Out[15]:
<pyarrow.lib.Int64Array object at 0x7f69d411afe0>
[
  1,
  4,
  5
]

How to replicate zip_with in PyArrow?


Solution

  • You can use PyArrow's if_else compute function:

    import pyarrow as pa
    import pyarrow.compute as pc
    
    # Your input arrays
    a1 = pa.array([1, 2, 3])
    mask = pa.array([True, False, False])
    a2 = pa.array([4, 5, 6])
    
    # Apply the if_else function
    result = pc.if_else(mask, a1, a2)
    
    print(result)
    

    [ 1, 5, 6 ]

    The if_else function effectively zips the arrays together based on the mask, replicating the behavior of Polars' zip_with.