pythonpandaslistpandas-groupbyapply

I have a pandas column with lists. Group rows that contains atleast one common element from same column


I have pandas df with 1 column with lists. I would like to group all lists which have at least one element in common.

Input Df :
> 
    Category
 0  [IAB19, IAB81, IAB82]
 1  [IAB25, IAB27]
 2  [IAB19, IAB20]
 3  [IAB22, IAB55, IAB56, IAB58]
 4  [IAB81, IAB89]
 5  [IAB82, IAB95]

i want to find out if any codes in df['Category'] is present in any other row of df['Category'].

And if yes , I would want to merge those lists sharing at least 1 common element.

Expected Output :

    Category
 0  [IAB19, IAB81, IAB82, IAB20, IAB89, IAB95]
 1  [IAB25, IAB27]
 2  [IAB22, IAB55, IAB56, IAB58]

Any thoughts ?


Solution

  • This is a hidden network problem , so we can trynetworkx , but before that you may need to explode the whole list columns to single row item (function available after pandas 0.25 )

    import networkx as nx
    df['Key']=df.index
    df=df.explode('Category')
    G=nx.from_pandas_edgelist(df, 'Category', 'Key')
    l=list(nx.connected_components(G))
    L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
    d={k: v for d in L for k, v in d.items()}
    s=df.groupby(df.Key.map(d)).Category.apply(set)
    s
    Key
    0    {IAB89, IAB82, IAB19, IAB95, IAB81, IAB20}
    1                                {IAB27, IAB25}
    2                  {IAB55, IAB56, IAB22, IAB58}
    Name: Category, dtype: object