So I have this code
unique_values = df.drop_duplicates(keep=False)
for _, row in unique_values.iterrows():
mask = (df == row).all(axis=1)
df_subset = df[mask]
So the above is supposed to find all the unique rows in the dataframe then find all the rows that match those values in the original dataframe. For whatever reason the above code df_subset
only returns a 1 row dataframe even if there are multiple rows in the original dataframe that match row
. How do I get all rows with specific column values that are in a pandas row?
So for example if I have row=[0,01] and a dataframe that looks like [[0,0,1],[2,2,0],[0,0,1],[0,0,1],[0,0,1]] it should return indexes 0,2,3,4.
IIUC, you can use:
>>> df[df.eq(row).all(axis=1)].index.tolist()
[0, 2, 3, 4]
Details:
>>> pd.concat([df, (m := df.eq(row)), m.all(axis=1)],
keys=['df', 'mask', 'all'], axis=1)
df mask all
0 1 2 0 1 2 0
0 0 0 1 True True True True
1 2 2 0 False False False False
2 0 0 1 True True True True
3 0 0 1 True True True True
4 0 0 1 True True True True
Input dataframe:
>>> df
0 1 2
0 0 0 1
1 2 2 0
2 0 0 1
3 0 0 1
4 0 0 1
>>> row
[0, 0, 1]