Is it possible to compare two dataframes with different labels, with the same results output as in df.compare(df2)?
I tried using df.compare(df2), like 'L' with 'R', and 'L' with 'N' - below - but received the following error message:
ValueError: Can only compare identically-labeled (both index and columns) DataFrame objects
`
L = pd.DataFrame(
{
'FellowshipID': [1001, 1002, 1003, 1004],
'FirstName': ['Frodo', 'Samwise', 'Gandalf', 'Pippin'],
'Skills': ['Hiding', 'Gardening', 'Spells', 'Fireworks']
}
)
R = pd.DataFrame(
{
'FellowshipID': [1001, 1002, 1006, 1007, 1008],
'FirstName': ['Frodo', 'Samwise', 'Legolas', 'Elrond', 'Barromir'],
'Age': [50, 39, 2931, 6520, 51]
}
)
N = pd.DataFrame(
data = {
'Relation': ['fri 1', 'fri 2', 'fri 3', 'fri 4'],
'Name': ['John', 'Jane', 'Adam', 'Omar'],
'Char': ['Hw', 'Amb', 'Adv', 'Sprt']
}
)
`
Thank you :-)
compare
requires the two inputs to have the same indices/columns.
One option is to reindex_like
, which will align Right to Left:
out = L.compare(R.reindex_like(L))
# FellowshipID FirstName Skills
# self other self other self other
# 0 NaN NaN NaN NaN Hiding NaN
# 1 NaN NaN NaN NaN Gardening NaN
# 2 1003.0 1006.0 Gandalf Legolas Spells NaN
# 3 1004.0 1007.0 Pippin Elrond Fireworks NaN
Another is to reindex
both inputs on the union
of their indices/columns:
def cust_compare(L, R):
idx = L.index.union(R.index)
cols = L.columns.union(R.columns)
return (L.reindex(index=idx, columns=cols)
.compare(R.reindex(index=idx, columns=cols))
)
out = cust_compare(L, R)
# Age FellowshipID FirstName Skills
# self other self other self other self other
# 0 NaN 50 NaN NaN NaN NaN Hiding NaN
# 1 NaN 39 NaN NaN NaN NaN Gardening NaN
# 2 NaN 2931 1003.0 1006.0 Gandalf Legolas Spells NaN
# 3 NaN 6520 1004.0 1007.0 Pippin Elrond Fireworks NaN
# 4 NaN 51 NaN 1008.0 NaN Barromir NaN NaN