pythonpandas

dictionary of columns to compare in data frame


i have a data frame with two columns (name1, name2) i would like to use a dictionary of column names and then do a for loop that compares if the values are the same and specifically show the values that are not the same

when i try the following i get an error "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"

# create df
test2 = {'NAME1': ['Tom', 'nick', 'krish', 'jack'],
        'NAME': ['Tom', 'nick', 'Carl', 'Bob']}
dfx = pd.DataFrame(test2)

#create dictionary
thisdict = {
  "NAME1": "NAME"
}

#loop and display differences
for a, b in thisdict.items():
    if dfx[a] != dfx[b]:
        x = dfx[[a, b]]
        print(x)

Solution

  • You need to compare the values row by row and filter the rows where the values in the two columns are not equal, try like below:

    # Loop and display differences
    for a, b in thisdict.items():
        # Compare the columns row by row
        mismatches = dfx[dfx[a] != dfx[b]]
        if not mismatches.empty:
            print(f"Mismatches between '{a}' and '{b}':")
            print(mismatches[[a, b]])