[SOLVED] Pandas: How to combine several rows with the same column value and create a new Dataframe which covers all possibilities?

Pandas: How to combine several rows with the same column value and create a new Dataframe which covers all possibilities?

There exists a DataFrame like this:

id	name	age
0x0	Hans	32
0x0	Peter	21
0x1	Jan	42
0x1	Simon	25
0x1	Klaus	51
0x1	Franz	72

I'm aiming to create a DataFrame that covers any possible combination within the same ID.

The only possibility for ID 0x0 is Hans and Peter. Since ID 0x1 exists four times, there are six possible solutions, as shown in the table below.

Since this is an example, one ID could also exist three, five, seven, ... times which results in more or less possibilities.

id	name0	age0	name1	age1
0x0	Hans	32	Peter	21
0x1	Jan	42	Simon	25
0x1	Jan	42	Klaus	51
0x1	Jan	42	Franz	72
0x1	Simon	25	Klaus	51
0x1	Simon	25	Franz	72
0x1	Klaus	51	Franz	72

Using combinations, I can already cover the possibility aspect, but I am losing the ages of each name on the way.

import pandas as pd
from  itertools import combinations

data = pd.DataFrame({'id': ["0x0", "0x0", "0x1", "0x1", "0x1", "0x1"], 'name': ["Hans","Peter","Jan","Simon","Klaus","Franz"], 'age': [32, 21, 42, 25, 51, 72]})

df = (data.groupby('id')['name'].apply(lambda x: pd.DataFrame(list(combinations(x,2))))
        .reset_index(level=1, drop=True)
        .reset_index())
print(df)

Solution

Core python itertools combinations is the solution. merge() to get the age

import itertools
df = pd.read_csv(io.StringIO("""id  name    age
0x0 Hans    32
0x0 Peter   21
0x1 Jan 42
0x1 Simon   25
0x1 Klaus   51
0x1 Franz   72"""), sep="\t")

df1 = (
df
    .groupby(["id"])["name"]
    .apply(lambda x: pd.DataFrame(itertools.combinations(list(x),2)))
    .reset_index()
    .merge(df, left_on=["id",0], right_on=["id","name"])
    .merge(df, left_on=["id",1], right_on=["id","name"], suffixes=("0","1"))
    .drop(columns=["level_1",0,1])
)

output

  id  name0  age0  name1  age1
 0x0   Hans    32  Peter    21
 0x1    Jan    42  Simon    25
 0x1    Jan    42  Klaus    51
 0x1  Simon    25  Klaus    51
 0x1    Jan    42  Franz    72
 0x1  Simon    25  Franz    72
 0x1  Klaus    51  Franz    72