pythonpandasset-comprehension

Return dataframes containing unique column pairs in Pandas?


I am trying to use pandas to select rows based on unique column pairs.

For example with the dataframe below read of of an csv:

    col1  col2   col3
0     1    10    [a, b, c, d]
1     1    10    [e, f, g, h]
2     2    11    [a, b, c, d]   
3     3    12    [i, j, k, l]   
4     3    12    [e, f, g, h]
5     5    14    [a, b, c, d]   
6     3    10    [m, n, o, p] 

This will give me the unique pairs out of col1, col2

df_unique = df['col1', 'col2'].drop_duplicates()

However, I am not sure about how to use each row in df_unique to return a dataframe containing rows that match.

I believe that I could use merge here, but uncertain about the method to use to go about it.

df.merge(df_unique, on=['col1', 'col2'], how='left')

Something like below but an for loop seems like an inefficient way to do this:

for ['col1','col2'] in df_unique:
    df_dict['col1, 'col2'] = df.merge(some_subframe,  on=['col1', 'col2'], how='left')

Resulting in dataframes like so:

df_uniq_list[(1,10)]
    col1  col2   col3 
0     1    10    [a, b, c, d]
1     1    10    [e, f, g, h]

df_uniq_list[(2,11)]
    col1  col2   col3
2     2    11    [a, b, c, d]   

df_uniq_list[(3,12)]
    col1  col2   col3
3     3    12    [i, j, k, l]   
4     3    12    [e, f, g, h]

Solution

  • You could try with

    df_uniq_list = dict([*df.groupby(['col1','col2'])])
    df_uniq_list[(1,10)]
        col1  col2   col3 
    0     1    10    [a, b, c, d]
    1     1    10    [e, f, g, h]