There exists a DataFrame like this:
id | name | age |
---|---|---|
0x0 | Hans | 32 |
0x0 | Peter | 21 |
0x1 | Jan | 42 |
0x1 | Simon | 25 |
0x1 | Klaus | 51 |
0x1 | Franz | 72 |
I'm aiming to create a DataFrame that covers any possible combination within the same ID.
The only possibility for ID 0x0 is Hans and Peter. Since ID 0x1 exists four times, there are six possible solutions, as shown in the table below.
Since this is an example, one ID could also exist three, five, seven, ... times which results in more or less possibilities.
id | name0 | age0 | name1 | age1 |
---|---|---|---|---|
0x0 | Hans | 32 | Peter | 21 |
0x1 | Jan | 42 | Simon | 25 |
0x1 | Jan | 42 | Klaus | 51 |
0x1 | Jan | 42 | Franz | 72 |
0x1 | Simon | 25 | Klaus | 51 |
0x1 | Simon | 25 | Franz | 72 |
0x1 | Klaus | 51 | Franz | 72 |
Using combinations, I can already cover the possibility aspect, but I am losing the ages of each name on the way.
import pandas as pd
from itertools import combinations
data = pd.DataFrame({'id': ["0x0", "0x0", "0x1", "0x1", "0x1", "0x1"], 'name': ["Hans","Peter","Jan","Simon","Klaus","Franz"], 'age': [32, 21, 42, 25, 51, 72]})
df = (data.groupby('id')['name'].apply(lambda x: pd.DataFrame(list(combinations(x,2))))
.reset_index(level=1, drop=True)
.reset_index())
print(df)
Core python itertools combinations is the solution. merge()
to get the age
import itertools
df = pd.read_csv(io.StringIO("""id name age
0x0 Hans 32
0x0 Peter 21
0x1 Jan 42
0x1 Simon 25
0x1 Klaus 51
0x1 Franz 72"""), sep="\t")
df1 = (
df
.groupby(["id"])["name"]
.apply(lambda x: pd.DataFrame(itertools.combinations(list(x),2)))
.reset_index()
.merge(df, left_on=["id",0], right_on=["id","name"])
.merge(df, left_on=["id",1], right_on=["id","name"], suffixes=("0","1"))
.drop(columns=["level_1",0,1])
)
id name0 age0 name1 age1
0x0 Hans 32 Peter 21
0x1 Jan 42 Simon 25
0x1 Jan 42 Klaus 51
0x1 Simon 25 Klaus 51
0x1 Jan 42 Franz 72
0x1 Simon 25 Franz 72
0x1 Klaus 51 Franz 72