pandas

Pandas get all rows where the values match a specific row


So I have this code

unique_values = df.drop_duplicates(keep=False)

for _, row in unique_values.iterrows():
    mask = (df == row).all(axis=1)
    df_subset = df[mask]

So the above is supposed to find all the unique rows in the dataframe then find all the rows that match those values in the original dataframe. For whatever reason the above code df_subset only returns a 1 row dataframe even if there are multiple rows in the original dataframe that match row. How do I get all rows with specific column values that are in a pandas row?

So for example if I have row=[0,01] and a dataframe that looks like [[0,0,1],[2,2,0],[0,0,1],[0,0,1],[0,0,1]] it should return indexes 0,2,3,4.


Solution

  • IIUC, you can use:

    >>> df[df.eq(row).all(axis=1)].index.tolist()
    [0, 2, 3, 4]
    

    Details:

    >>> pd.concat([df, (m := df.eq(row)), m.all(axis=1)],
                  keys=['df', 'mask', 'all'], axis=1)
    
      df         mask                  all
       0  1  2      0      1      2      0
    0  0  0  1   True   True   True   True
    1  2  2  0  False  False  False  False
    2  0  0  1   True   True   True   True
    3  0  0  1   True   True   True   True
    4  0  0  1   True   True   True   True
    

    Input dataframe:

    >>> df
       0  1  2
    0  0  0  1
    1  2  2  0
    2  0  0  1
    3  0  0  1
    4  0  0  1
    
    >>> row
    [0, 0, 1]