i have a data frame with two columns (name1, name2) i would like to use a dictionary of column names and then do a for loop that compares if the values are the same and specifically show the values that are not the same
when i try the following i get an error "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"
# create df
test2 = {'NAME1': ['Tom', 'nick', 'krish', 'jack'],
'NAME': ['Tom', 'nick', 'Carl', 'Bob']}
dfx = pd.DataFrame(test2)
#create dictionary
thisdict = {
"NAME1": "NAME"
}
#loop and display differences
for a, b in thisdict.items():
if dfx[a] != dfx[b]:
x = dfx[[a, b]]
print(x)
You need to compare the values row by row and filter the rows where the values in the two columns are not equal, try like below:
# Loop and display differences
for a, b in thisdict.items():
# Compare the columns row by row
mismatches = dfx[dfx[a] != dfx[b]]
if not mismatches.empty:
print(f"Mismatches between '{a}' and '{b}':")
print(mismatches[[a, b]])