I have created a pandas dataframe as follows:
ds = {'col1' : ["A","B"], 'probability' : [0.3, 0.6]}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
print(df)
col1 probability
0 A 0.3
1 B 0.6
I need to create a new dataframe which duplicates each row and assign to the duplicated record a probability needed to sum up to 1.
From the example above:
The resulting dataframe looks like this:
col1 probability
0 A 0.3
1 A 0.7
2 B 0.6
3 B 0.4
Can anyone help me doing it in pandas, please?
You can use this:
df = pd.concat([df, df.assign(probability=1 - df["probability"])], ignore_index=True)
col1 probability
0 A 0.3
1 B 0.6
2 A 0.7
3 B 0.4