pythonstringpandascumulative-frequency

Cumulative frequency for string occurence


To start off, a little about my problem.

I have a data frame of winners of the champions league cup indexed by years.

Like this, note team names are strings.

year    team         need this         year    team      wins to date

1       team1                           1       team1          1
2       team2                           2       team2          1
3       team1                           3       team1          2
4       team3                           4       team3          1

I would like to create a new column with cumulative wins, so that i can then plot a line graph of each team and the number of times they have won the cup. ie I need the year on the x axis and the cumulative frequency on the y, with 4 different lines for the top 4 teams.

There is probably an easy way using the count function and a loop but I'm fairly new to python and don't have a very good grasp of these. Any help would be much appreciated!


Solution

  • Look at pandas documentation on groupby, transform, cumcount, and cumsum (http://pandas.pydata.org/pandas-docs/stable/groupby.html). You can groupby team and use transform to do a cumsum on bool result of whether the team is null or not.

    df
    

    Input:

       year   team
    0     1  team1
    1     2  team2
    2     3  team1
    3     4  team3
    

    Do transformation

    df['wins to date'] = df.groupby('team').transform(lambda x: x.notnull().cumsum())
    df
    

    or as DSM suggest:

    df['wins to date'] = df.groupby('team').cumcount()+1
    

    output

       year   team  wins to date
    0     1  team1             1
    1     2  team2             1
    2     3  team1             2
    3     4  team3             1