In Polars, I can use zip_width
in order to take values from s1
or s2
according to a mask:
In [1]: import polars as pl
In [2]: import pyarrow as pa
In [3]: import pyarrow as pc
In [4]: s1 = pl.Series([1,2,3])
In [5]: mask = pl.Series([True, False, False])
In [6]: s2 = pl.Series([4, 5, 6])
In [7]: s1.zip_with(mask, s2)
Out[7]:
shape: (3,)
Series: '' [i64]
[
1
5
6
]
How can I do this with PyArrow? I've tried pyarrow.compute.replace_with_mask
but that works differently:
In [10]: import pyarrow.compute as pc
In [11]: import pyarrow as pa
In [12]: a1 = pa.array([1,2,3])
In [13]: mask = pa.array([True, False, False])
In [14]: a2 = pa.array([4,5,6])
In [15]: pc.replace_with_mask(a1, pc.invert(mask), a2)
Out[15]:
<pyarrow.lib.Int64Array object at 0x7f69d411afe0>
[
1,
4,
5
]
How to replicate zip_with
in PyArrow?
You can use PyArrow's if_else
compute function:
import pyarrow as pa
import pyarrow.compute as pc
# Your input arrays
a1 = pa.array([1, 2, 3])
mask = pa.array([True, False, False])
a2 = pa.array([4, 5, 6])
# Apply the if_else function
result = pc.if_else(mask, a1, a2)
print(result)
[ 1, 5, 6 ]
The if_else
function effectively zips the arrays together based on the mask, replicating the behavior of Polars' zip_with
.