pythonpandascountingnp

Filling a column with the amount of duplicated values in another column


I have a df like this:

month outcome mom.ret
10/20 winner 0.2
10/20 winner 0.9
11/20 winner 0.6
11/20 winner 0.2
11/20 winner 0.9
10/20 loser 0.6
10/20 loser 0.2
10/20 loser 0.9
11/20 loser 0.6

I would like to add another column, which has 1 / by the counts of times the value "winner" or "loser" appears per each month on the column outcome. The expected output for the example df is:

month outcome mom.ret q
10/20 winner 0.2 1/2
10/20 winner 0.9 1/2
11/20 winner 0.6 1/3
11/20 winner 0.2 1/3
11/20 winner 0.9 1/3
10/20 loser 0.6 1/3
10/20 loser 0.2 1/3
10/20 loser 0.9 1/3
11/20 loser 0.6 1/1

I thought of using the function count to count how many times the values are repeated, but then I need to specify that the "count" should be done per each date. Any ideas?


Solution

  • You can use this code to achieve what you want, assuming your original DataFrame is called df:

    counts = df.groupby(['month', 'outcome'], as_index=False).count()
    counts = counts.rename(columns={'mom.ret': 'q'})
    # Use this line if you want the float value of the division 0.5
    # counts['q'] = 1/counts['q']
    # Use this line if you want the string '1/2'
    counts['q'] = counts['q'].apply(lambda x: f'1/{x}')
    result = pd.merge(df, counts)
    

    The result looks like this:

    month   outcome mom.ret q
    0   10/20   winner  0.2 1/2
    1   10/20   winner  0.9 1/2
    2   11/20   winner  0.6 1/3
    3   11/20   winner  0.2 1/3
    4   11/20   winner  0.9 1/3
    5   10/20   loser   0.6 1/2
    6   10/20   loser   0.2 1/2
    7   11/20   loser   0.9 1/2
    8   11/20   loser   0.6 1/2