pythonpandasdataframemin

How to create DataFrame that is the minimal values based on 2 other DataFrame's in pandas (python)?


Let's say I have DataFrames df1 and df2:

>>> df1 = pd.DataFrame({'A': [0, 2, 4], 'B': [2, 17, 7], 'C': [4, 9, 11]})
>>> df1
   A   B   C
0  0   2   4
1  2  17   9
2  4   7  11
>>> df2 = pd.DataFrame({'A': [9, 2, 32], 'B': [1, 3, 8], 'C': [6, 2, 41]})
>>> df2
    A  B   C
0   9  1   6
1   2  3   2
2  32  8  41

What I want is the 3rd DataFrame that will have minimal rows (min is calculated based on column B), that is:

>>> df3
   A  B   C
0  9  1   6
1  2  3   2
2  4  7  11

I really don't want to do this by iterating over all rows and comparing them one by one, is there a faster and compact way to do this?


Solution

  • You can mask df1 with df2 when df2['B'] < df1['B']:

    out = df1.mask(df2['B'].lt(df1['B']), df2)
    

    Output:

       A  B   C
    0  9  1   6
    1  2  3   2
    2  4  7  11