pythonpandasdata-scienceequality

Pandas Compare two data frames and look for duplicate elements


I want to compare df and df_equal. df contains several individual data frames

    import pandas as pd

    df1 = pd.DataFrame([[ 'b', 'b', 'b' ]],
        columns=['a', 'b', 'c'])

Output:
    a   b   c
0   b   b   b
    df2 = pd.DataFrame([[ 'x', 'x', 'x' ]],
        columns=['a', 'b', 'c'])
Output:
    a   b   c
0   x   x   x
df = pd.concat([df1, df2])
    a   b   c
0   b   b   b
0   x   x   x
df_equal = pd.DataFrame([[ 'x', 'x', 'x' ]],
    columns=['a', 'b', 'c'])

how can i check df for duplicate?

I tried .equals:

for row in df:
    df.equals(exactly_equal)

my desired output:

False #first row in df 
True  #second row in df

Solution

  • You could just iterate over the rows, for example to compare every row of df to df2 (given that df2 only has one row):

    for row in range(len(df)):
        print((df.iloc[row, ].values == df2.values).all())
    
    False
    True