I have created the following pandas dataframe:
import pandas as pd
ds = {'col1' : ['A','A','B','C','C','D'],
'col2' : ['A','B','C','D','D','A']}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
print(df)
col1 col2
0 A A
1 A B
2 B C
3 C D
4 C D
5 D A
The possible values in col1
and col2
are A
, B
, C
and D
.
I need to create 4 new columns, called:
countA
: it counts how many A
are in each row / recordcountB
: it counts how many B
are in each row / recordcountC
: it counts how many C
are in each row / recordcountD
: it counts how many D
are in each row / recordSo, from the example above, the resulting dataframe would look like this:
Can anyone help me please?
Here is a way using pd.get_dummies()
df.join(pd.get_dummies(df,prefix='',prefix_sep='').T.groupby(level=0).sum().T.rename('count{}'.format,axis=1))
and here is a way using value_counts()
df.join(df.stack().groupby(level=0).value_counts().unstack(fill_value = 0).rename('count{}'.format,axis=1))
Output:
col1 col2 countA countB countC countD
0 A A 2 0 0 0
1 A B 1 1 0 0
2 B C 0 1 1 0
3 C D 0 0 1 1
4 C D 0 0 1 1
5 D A 1 0 0 1