I have two dataframes with identical shape and want to update df1 with df2 if some conditions are met
import pandas as pd
from typing import Any
df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
print(df1, "\n")
df2 = pd.DataFrame({"A": [7, 8, 9], "B": [10, 3, 12]})
print(df2, "\n")
# Define a condition function
def condition(x: Any) -> bool:
"""condition function to update only cells matching the conditions"""
return True if x in [2, 7, 9] else False
df1.update(df2)
print(df1)
but if I use filter_func df1.update(df2,filter_func=condition)
it fails with ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Unfortunately the docu is not very verbose.
How to update a dataframe with conditions correctly?
Your function will receive a 1D numpy array (per column), it should be vectorized and return a boolean 1D array (callable(1d-array) -> bool 1d-array).
Use numpy.isin
to test membership:
def condition(x):
"""condition function to update only cells matching the conditions"""
return np.isin(x, [2, 7, 9])
df1.update(df2, filter_func=condition)
with a lambda:
df1.update(df2, filter_func=lambda x: np.isin(x, [2, 7, 9]))
Alternatively, if you really can't vectorize with pure numpy functions (this should not be done here!), decorate with numpy.vectorize
:
@np.vectorize
def condition(x: Any) -> bool:
"""condition function to update only cells matching the conditions"""
return True if x in [2, 7, 9] else False
df1.update(df2, filter_func=condition)
Updated df1
:
A B
0 1 4
1 8 5
2 3 6