In the following toy example, i'm trying to add a status column based on the outer merge results. The challenge is to preserve the chaining method as best described in tom's blog. The commented out line is my attempt at it but it does not work
import pandas as pd
# Create sample data frames A and B
A = pd.DataFrame({
'key': ['A', 'B', 'C', 'D'],
'value': [1, 2, 3, 4]
})
B = pd.DataFrame({
'key': ['C', 'D', 'E', 'F'],
'value': [3, 4, 5, 6]
})
# Merge data frames A and B on the 'key' column and add an indicator column
merged = pd.merge(A, B, on='key', how='outer', indicator=True)
# add a status column
#{'both':'no change',
#'left_only': 'added',
#'right_only': 'removed'}
merged = (merged
.assign (status = 'no change')
#.assign(status = lambda x: x.loc[x._merge == 'left_only'], 'added')
.drop('_merge', axis=1)
)
something like this should suffice - generally for the slice since you are assigning, you need to use a conditional (map
, np.where
, np.select
, pd.where
etc)
(A
.merge(B, on='key', how='outer', indicator=True)
.assign(status = lambda f: f._merge.map({"left_only":"added",
"both":"no change",
"right_only":"removed"}))
)