pythonpandascompare

Pandas compare() how to iterate over rows


I am comparing two dfs using pandas compare()

updated_records = out['df1'].compare(out['df2']) 

    userid       amount
  self other self other
0   122  121  NaN   NaN
2  NaN   NaN  3.0   4.0

how to efficiently iterate over those rows to have output some thing like this (there are more columns than those two it is only example) i want to point which columns to show in report [userid, amount, but not others ]:

First row:
UserId: 122 -> 121

Second row:
amount: 3.0 -> 4.0

I need output like for report:

"For item XXX_XXX those fields where updated:
UserId: 122 -> 121,
xyz: a -> b,
             "


"For item XXX_YYY those fields where updated:
amount: 3.0 -> 4.0,
xyz: 45 -> ert

Basicly to to show changes

i was trying iterrows() but df get pivoted to something like this

            other    2024-06-06T11:23:35
SaveTS       self     2024-06-06T10:36:53
             other    2024-06-06T11:23:40
A            self                     UP4
             other                    UP5
B            self     2025-02-07T00:00:00
             other    2025-02-08T00:00:00
C            self              7 FEB 2025
             other             8 FEB 2025
D            self     2025-02-07T00:00:00
             other    2025-02-08T00:00:00
E            self                     7.0
             other                    8.0

and got stuck, as above i want to select A,B,C,E only to show in changes report

thanks


Solution

  • Example Code

    It wasn't clear what exact output style you wanted, so I provided it as a new example.

    import pandas as pd
    data = {
        ('userid', 'self'): [122, None],
        ('userid', 'other'): [121, None],
        ('amount', 'self'): [None, 3.0],
        ('amount', 'other'): [2, 4.0]
    }
    
    df = pd.DataFrame(data, index=[0, 2])
    

    df:

    enter image description here

    Code

    for idx, row in df.iterrows():
        print('For item {} those fields where updated:'.format(idx))
        colums_lv0 = row[row.notna()].index.get_level_values(0).drop_duplicates()
        for col in colums_lv0:        
            print('{} : {} -> {}'.format(col, row[(col, 'self')], row[(col, 'other')]))
        print()
    

    print:

    For item 0 those fields where updated:
    userid : 122.0 -> 121.0
    amount : nan -> 2.0
    
    For item 2 those fields where updated:
    amount : 3.0 -> 4.0