pythonpandasdataframe

Applying function to filtered columns in Pandas


I have a Pandas Dataframe with 4 different columns: an ID, country, team and a color that is assigned to each player following a specific order.

I want to create a new column that contains a number based on the team and the country that simply counts up following the color order, however colors may appear more than once per team. The column "ID" has to be simply sorted according to the alphabet, then the country column has to be filtered by country, then the script needs to check what teams are in what country and accordingly filter by team, then sort team by the color code and number the first team, then filter the country for the next team, sort again by color but CONTINUE the counting until all teams of a country are numbered. Then the next country gets filtered and the numbering starts again from 1 with the first team of that country.

It sounds complicated and I have an example code here. I apologize, it is not small but I figure it needs to be of a certain size to make the problem more understandable.

I used df = df.sort_values(by='ID') to sort the column ID by alphabet and I sorted the column 'color' by making it categorical using df['Color'] = pd.Categorical(df['Color'], colorcode) (similar to the custom sorting in Excel)

I have added the column Result to the example which shows what I am trying to reach programmatically. It does not matter whether the Result numbers are integers or strings.

Here the example:

import pandas as pd

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

colorcode = ['red', 'green', 'blue', 'yellow', 'white', 'grey', 'brown', 'violet', 'turquoise', 'black', 'orange', 'pink', 'red2', 'green2', 'blue2', 'yellow2', 'white2', 'grey2', 'brown2', 'violet2', 'turquoise2', 'black2', 'orange2', 'pink2']

data = {
'ID' : ['12318683-999', '12318683-001', '12318687-999', '12318687-001', '12318684-999', '12318684-001', '12318686-999', '12318686-001', '12318685-999', '12318685-001', '12319256-999', '12319256-004', '12319256-003', '12319256-002', '12319256-001', '12319255-999', '12319255-002', '12319255-001', '12317944-999', '12317944-009', '12317944-008', '12317944-007', '12317944-006', '12317944-005', '12317944-004', '12317944-003', '12317944-002', '12317944-010', '12317944-001', '12317942-006', '12317942-005', '12317942-004', '12317942-003', '12317942-002', '12317942-001', '12317943-006', '12317943-005', '12317943-004', '12317943-003', '12317943-002', '12317943-001', '12317941-999', '12317941-009', '12317941-008', '12317941-007', '12317941-006', '12317941-005', '12317941-004', '12317941-003', '12317941-002', '12317941-001', '12319261-999', '12319261-001', '12319260-999', '12319260-001', '12319259-999', '12319259-001', '12319095-999', '12319095-001', '12319258-999', '12319258-002', '12319258-001', '12319257-999', '12319257-001', '12319262-999', '12319262-003', '12319262-002', '12319262-001', '12319264-006', '12319264-005', '12319264-004', '12319264-003', '12319264-002', '12319264-001', '12319263-006', '12319263-005', '12319263-004', '12319263-003', '12319263-002', '12319263-001', '12318985-009', '12318985-008', '12318985-007', '12318985-006', '12318985-005', '12318985-004', '12318985-003', '12318985-002', '12318985-012', '12318985-011', '12318985-010', '12318985-001', '12318986-999', '12318986-004', '12318986-003', '12318986-002', '12318986-001', '12317719-999', '12317719-003', '12317719-002', '12317719-001', '12319310-999', '12319310-003', '12319310-002', '12319310-001', '12317718-999', '12317718-002', '12317718-001', '12319311-999', '12319311-001', '12317720-999', '12317720-001', '12319319-999', '12319319-008', '12319319-007', '12319319-006', '12319319-005', '12319319-004', '12319319-003', '12319319-002', '12319319-001', '12317721-999', '12317721-001', '12318721-999', '12318721-001', '12318716-999', '12318716-001', '12318724-999', '12318724-001', '12318725-999', '12318725-004', '12318725-003', '12318725-002', '12318725-001', '12318726-999', '12318726-001', '12318715-999', '12318715-001', '12318718-999', '12318718-001', '12319123-999', '12319123-003', '12319123-002', '12319123-001', '12318714-999', '12318714-001', '12319118-999', '12319118-002', '12319118-001', '12318713-999', '12318713-001', '12319121-999', '12319121-004', '12319121-003', '12319121-002', '12319121-001', '12318727-999', '12318727-001', '12319116-999', '12319116-003', '12319116-002', '12319116-001', '12319119-999', '12319119-002', '12319119-001', '12319120-999', '12319120-003', '12319120-002', '12319120-001', '12319304-999', '12319304-005', '12319304-004', '12319304-003', '12319304-002', '12319304-001', '12319122-999', '12319122-002', '12319122-001', '12319117-999', '12319117-005', '12319117-004', '12319117-003', '12319117-002', '12319117-001', '12319305-999', '12319305-001', '12319306-999', '12319306-001', '23149872-999', '23149872-002', '23149872-001', '12320092-999', '12320092-002', '12320092-001', '12320093-999', '12320093-002', '12320093-001', '12320095-999', '12320095-001', '12318669-999', '12318669-002', '12318669-001', '12318364-999', '12318364-001', '12318366-999', '12318366-001', '12318365-999', '12318365-001', '12318644-999', '12318644-001'],
'Country': ['UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'USA', 'USA'],
'Team' : ['Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team3', 'Team3', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team3', 'Team3', 'Team3', 'Team3', 'Team1', 'Team1', 'Team3', 'Team3', 'Team3', 'Team1', 'Team1', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team1', 'Team1', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team2', 'Team2', 'Team2', 'Team2', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team4', 'Team4'],
'Color' : ['red', 'red', 'green', 'green', 'blue', 'blue', 'yellow', 'yellow', 'white', 'white', 'violet', 'violet', 'violet', 'violet', 'violet', 'brown', 'brown', 'brown', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue', 'green', 'green', 'green', 'green', 'green', 'green', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'pink', 'pink', 'red-2', 'red-2', 'green-2', 'green-2', 'turquoise', 'turquoise', 'blue-2', 'blue-2', 'blue-2', 'yellow-2', 'yellow-2', 'turquoise', 'turquoise', 'turquoise', 'turquoise', 'orange', 'orange', 'orange', 'orange', 'orange', 'orange', 'black', 'black', 'black', 'black', 'black', 'black', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'white', 'white', 'white', 'white', 'white', 'grey', 'grey', 'grey', 'grey', 'green', 'green', 'green', 'green', 'white', 'white', 'white', 'brown', 'brown', 'yellow', 'yellow', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'blue', 'blue', 'grey', 'grey', 'white', 'white', 'yellow', 'yellow', 'blue', 'blue', 'blue', 'blue', 'blue', 'black', 'black', 'turquoise', 'turquoise', 'red', 'red', 'red', 'red', 'red', 'red', 'green', 'green', 'green', 'green', 'green', 'violet', 'violet', 'blue', 'blue', 'blue', 'blue', 'blue', 'brown', 'brown', 'yellow', 'yellow', 'yellow', 'yellow', 'white', 'white', 'white', 'grey', 'grey', 'grey', 'grey', 'black', 'black', 'black', 'black', 'black', 'black', 'brown', 'brown', 'brown', 'violet', 'violet', 'violet', 'violet', 'violet', 'violet', 'turquoise', 'turquoise', 'violet', 'violet', 'grey', 'grey', 'grey', 'white', 'white', 'white', 'yellow', 'yellow', 'yellow', 'blue', 'blue', 'green', 'green', 'green', 'turquoise', 'turquoise', 'violet', 'violet', 'brown', 'brown', 'red', 'red'],
'Result' : ['10', '10', '11', '11', '12', '12', '13', '13', '14', '14', '17', '17', '17', '17', '17', '16', '16', '16', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '3', '3', '3', '3', '3', '3', '2', '2', '2', '2', '2', '2', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '21', '21', '22', '22', '23', '23', '9', '9', '24', '24', '24', '25', '25', '18', '18', '18', '18', '20', '20', '20', '20', '20', '20', '19', '19', '19', '19', '19', '19', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '5', '5', '5', '5', '5', '16', '16', '16', '16', '12', '12', '12', '12', '15', '15', '15', '17', '17', '14', '14', '11', '11', '11', '11', '11', '11', '11', '11', '11', '13', '13', '6', '6', '5', '5', '4', '4', '3', '3', '3', '3', '3', '10', '10', '9', '9', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '8', '8', '3', '3', '3', '3', '3', '7', '7', '4', '4', '4', '4', '5', '5', '5', '6', '6', '6', '6', '20', '20', '20', '20', '20', '20', '7', '7', '7', '8', '8', '8', '8', '8', '8', '19', '19', '18', '18', '15', '15', '15', '14', '14', '14', '13', '13', '13', '12', '12', '11', '11', '11', '9', '9', '8', '8', '7', '7', '10', '10']
}

df = pd.DataFrame(data)

df = df.sort_values(by='ID') # This line sorts the column ID by alphabet
df['Color'] = pd.Categorical(df['Color'], colorcode)
df = pd.DataFrame(data)

print(df)

My problem is that I cannot figure out how to filter the columns (first Country, then Team) and then count up according to the color, starting from 1 for red and not start at 1 again for the next team as long as I am still in the same country.


Solution

  • Convert the color column to ordered categorical type then group the dataframe by the combination of Country, Team and Color then use ngroup to assign ordered group numbers to each unique combination. Then for each Country rank the group numbers using dense method

    df['Color'] = pd.Categorical(df['Color'], colorcode, ordered=True)
    
    df['Rank'] = df.groupby(['Country', 'Team', 'Color'], sort=True, dropna=False).ngroup()
    df['Rank'] = df.groupby(['Country'])['Rank'].rank(method='dense').astype(int)
    

    Result

                   ID  Country   Team      Color  Result  Rank
    139  12318718-001  Germany  Team1        red       1     1
    138  12318718-999  Germany  Team1        red       1     1
    144  12318714-999  Germany  Team1      green       2     2
    145  12318714-001  Germany  Team1      green       2     2
    129  12318725-999  Germany  Team1       blue       3     3
    130  12318725-004  Germany  Team1       blue       3     3
    131  12318725-003  Germany  Team1       blue       3     3
    132  12318725-002  Germany  Team1       blue       3     3
    133  12318725-001  Germany  Team1       blue       3     3
    127  12318724-999  Germany  Team1     yellow       4     4
    128  12318724-001  Germany  Team1     yellow       4     4
    126  12318716-001  Germany  Team1      white       5     5
    125  12318716-999  Germany  Team1      white       5     5
    123  12318721-999  Germany  Team1       grey       6     6
    124  12318721-001  Germany  Team1       grey       6     6
    156  12318727-999  Germany  Team1      brown       7     7
    157  12318727-001  Germany  Team1      brown       7     7
    149  12318713-999  Germany  Team1     violet       8     8
    150  12318713-001  Germany  Team1     violet       8     8
    136  12318715-999  Germany  Team1  turquoise       9     9
    137  12318715-001  Germany  Team1  turquoise       9     9
    134  12318726-999  Germany  Team1      black      10    10
    135  12318726-001  Germany  Team1      black      10    10
    115  12319319-006  Germany  Team2        red      11    11
    113  12319319-008  Germany  Team2        red      11    11
    112  12319319-999  Germany  Team2        red      11    11
    120  12319319-001  Germany  Team2        red      11    11
    119  12319319-002  Germany  Team2        red      11    11
    118  12319319-003  Germany  Team2        red      11    11
    117  12319319-004  Germany  Team2        red      11    11
    116  12319319-005  Germany  Team2        red      11    11
    114  12319319-007  Germany  Team2        red      11    11
    104  12319310-001  Germany  Team2      green      12    12
    102  12319310-003  Germany  Team2      green      12    12
    ...