I am trying to use pandas to select rows based on unique column pairs.
For example with the dataframe below read of of an csv:
col1 col2 col3
0 1 10 [a, b, c, d]
1 1 10 [e, f, g, h]
2 2 11 [a, b, c, d]
3 3 12 [i, j, k, l]
4 3 12 [e, f, g, h]
5 5 14 [a, b, c, d]
6 3 10 [m, n, o, p]
This will give me the unique pairs out of col1, col2
df_unique = df['col1', 'col2'].drop_duplicates()
However, I am not sure about how to use each row in df_unique to return a dataframe containing rows that match.
I believe that I could use merge here, but uncertain about the method to use to go about it.
df.merge(df_unique, on=['col1', 'col2'], how='left')
Something like below but an for loop seems like an inefficient way to do this:
for ['col1','col2'] in df_unique:
df_dict['col1, 'col2'] = df.merge(some_subframe, on=['col1', 'col2'], how='left')
Resulting in dataframes like so:
df_uniq_list[(1,10)]
col1 col2 col3
0 1 10 [a, b, c, d]
1 1 10 [e, f, g, h]
df_uniq_list[(2,11)]
col1 col2 col3
2 2 11 [a, b, c, d]
df_uniq_list[(3,12)]
col1 col2 col3
3 3 12 [i, j, k, l]
4 3 12 [e, f, g, h]
You could try with
df_uniq_list = dict([*df.groupby(['col1','col2'])])
df_uniq_list[(1,10)]
col1 col2 col3
0 1 10 [a, b, c, d]
1 1 10 [e, f, g, h]