I have a Pandas Dataframe with 4 different columns: an ID, country, team and a color that is assigned to each player following a specific order.
I want to create a new column that contains a number based on the team and the country that simply counts up following the color order, however colors may appear more than once per team. The column "ID" has to be simply sorted according to the alphabet, then the country column has to be filtered by country, then the script needs to check what teams are in what country and accordingly filter by team, then sort team by the color code and number the first team, then filter the country for the next team, sort again by color but CONTINUE the counting until all teams of a country are numbered. Then the next country gets filtered and the numbering starts again from 1 with the first team of that country.
It sounds complicated and I have an example code here. I apologize, it is not small but I figure it needs to be of a certain size to make the problem more understandable.
I used df = df.sort_values(by='ID')
to sort the column ID by alphabet and I sorted the column 'color' by making it categorical using df['Color'] = pd.Categorical(df['Color'], colorcode)
(similar to the custom sorting in Excel)
I have added the column Result to the example which shows what I am trying to reach programmatically. It does not matter whether the Result numbers are integers or strings.
Here the example:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
colorcode = ['red', 'green', 'blue', 'yellow', 'white', 'grey', 'brown', 'violet', 'turquoise', 'black', 'orange', 'pink', 'red2', 'green2', 'blue2', 'yellow2', 'white2', 'grey2', 'brown2', 'violet2', 'turquoise2', 'black2', 'orange2', 'pink2']
data = {
'ID' : ['12318683-999', '12318683-001', '12318687-999', '12318687-001', '12318684-999', '12318684-001', '12318686-999', '12318686-001', '12318685-999', '12318685-001', '12319256-999', '12319256-004', '12319256-003', '12319256-002', '12319256-001', '12319255-999', '12319255-002', '12319255-001', '12317944-999', '12317944-009', '12317944-008', '12317944-007', '12317944-006', '12317944-005', '12317944-004', '12317944-003', '12317944-002', '12317944-010', '12317944-001', '12317942-006', '12317942-005', '12317942-004', '12317942-003', '12317942-002', '12317942-001', '12317943-006', '12317943-005', '12317943-004', '12317943-003', '12317943-002', '12317943-001', '12317941-999', '12317941-009', '12317941-008', '12317941-007', '12317941-006', '12317941-005', '12317941-004', '12317941-003', '12317941-002', '12317941-001', '12319261-999', '12319261-001', '12319260-999', '12319260-001', '12319259-999', '12319259-001', '12319095-999', '12319095-001', '12319258-999', '12319258-002', '12319258-001', '12319257-999', '12319257-001', '12319262-999', '12319262-003', '12319262-002', '12319262-001', '12319264-006', '12319264-005', '12319264-004', '12319264-003', '12319264-002', '12319264-001', '12319263-006', '12319263-005', '12319263-004', '12319263-003', '12319263-002', '12319263-001', '12318985-009', '12318985-008', '12318985-007', '12318985-006', '12318985-005', '12318985-004', '12318985-003', '12318985-002', '12318985-012', '12318985-011', '12318985-010', '12318985-001', '12318986-999', '12318986-004', '12318986-003', '12318986-002', '12318986-001', '12317719-999', '12317719-003', '12317719-002', '12317719-001', '12319310-999', '12319310-003', '12319310-002', '12319310-001', '12317718-999', '12317718-002', '12317718-001', '12319311-999', '12319311-001', '12317720-999', '12317720-001', '12319319-999', '12319319-008', '12319319-007', '12319319-006', '12319319-005', '12319319-004', '12319319-003', '12319319-002', '12319319-001', '12317721-999', '12317721-001', '12318721-999', '12318721-001', '12318716-999', '12318716-001', '12318724-999', '12318724-001', '12318725-999', '12318725-004', '12318725-003', '12318725-002', '12318725-001', '12318726-999', '12318726-001', '12318715-999', '12318715-001', '12318718-999', '12318718-001', '12319123-999', '12319123-003', '12319123-002', '12319123-001', '12318714-999', '12318714-001', '12319118-999', '12319118-002', '12319118-001', '12318713-999', '12318713-001', '12319121-999', '12319121-004', '12319121-003', '12319121-002', '12319121-001', '12318727-999', '12318727-001', '12319116-999', '12319116-003', '12319116-002', '12319116-001', '12319119-999', '12319119-002', '12319119-001', '12319120-999', '12319120-003', '12319120-002', '12319120-001', '12319304-999', '12319304-005', '12319304-004', '12319304-003', '12319304-002', '12319304-001', '12319122-999', '12319122-002', '12319122-001', '12319117-999', '12319117-005', '12319117-004', '12319117-003', '12319117-002', '12319117-001', '12319305-999', '12319305-001', '12319306-999', '12319306-001', '23149872-999', '23149872-002', '23149872-001', '12320092-999', '12320092-002', '12320092-001', '12320093-999', '12320093-002', '12320093-001', '12320095-999', '12320095-001', '12318669-999', '12318669-002', '12318669-001', '12318364-999', '12318364-001', '12318366-999', '12318366-001', '12318365-999', '12318365-001', '12318644-999', '12318644-001'],
'Country': ['UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'UK', 'UK', 'UK', 'USA', 'USA'],
'Team' : ['Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team6', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team3', 'Team3', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team1', 'Team3', 'Team3', 'Team3', 'Team3', 'Team1', 'Team1', 'Team3', 'Team3', 'Team3', 'Team1', 'Team1', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team1', 'Team1', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team2', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team3', 'Team2', 'Team2', 'Team2', 'Team2', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team4', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team5', 'Team4', 'Team4'],
'Color' : ['red', 'red', 'green', 'green', 'blue', 'blue', 'yellow', 'yellow', 'white', 'white', 'violet', 'violet', 'violet', 'violet', 'violet', 'brown', 'brown', 'brown', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue', 'green', 'green', 'green', 'green', 'green', 'green', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'pink', 'pink', 'red-2', 'red-2', 'green-2', 'green-2', 'turquoise', 'turquoise', 'blue-2', 'blue-2', 'blue-2', 'yellow-2', 'yellow-2', 'turquoise', 'turquoise', 'turquoise', 'turquoise', 'orange', 'orange', 'orange', 'orange', 'orange', 'orange', 'black', 'black', 'black', 'black', 'black', 'black', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'grey', 'white', 'white', 'white', 'white', 'white', 'grey', 'grey', 'grey', 'grey', 'green', 'green', 'green', 'green', 'white', 'white', 'white', 'brown', 'brown', 'yellow', 'yellow', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'blue', 'blue', 'grey', 'grey', 'white', 'white', 'yellow', 'yellow', 'blue', 'blue', 'blue', 'blue', 'blue', 'black', 'black', 'turquoise', 'turquoise', 'red', 'red', 'red', 'red', 'red', 'red', 'green', 'green', 'green', 'green', 'green', 'violet', 'violet', 'blue', 'blue', 'blue', 'blue', 'blue', 'brown', 'brown', 'yellow', 'yellow', 'yellow', 'yellow', 'white', 'white', 'white', 'grey', 'grey', 'grey', 'grey', 'black', 'black', 'black', 'black', 'black', 'black', 'brown', 'brown', 'brown', 'violet', 'violet', 'violet', 'violet', 'violet', 'violet', 'turquoise', 'turquoise', 'violet', 'violet', 'grey', 'grey', 'grey', 'white', 'white', 'white', 'yellow', 'yellow', 'yellow', 'blue', 'blue', 'green', 'green', 'green', 'turquoise', 'turquoise', 'violet', 'violet', 'brown', 'brown', 'red', 'red'],
'Result' : ['10', '10', '11', '11', '12', '12', '13', '13', '14', '14', '17', '17', '17', '17', '17', '16', '16', '16', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '3', '3', '3', '3', '3', '3', '2', '2', '2', '2', '2', '2', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '21', '21', '22', '22', '23', '23', '9', '9', '24', '24', '24', '25', '25', '18', '18', '18', '18', '20', '20', '20', '20', '20', '20', '19', '19', '19', '19', '19', '19', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '5', '5', '5', '5', '5', '16', '16', '16', '16', '12', '12', '12', '12', '15', '15', '15', '17', '17', '14', '14', '11', '11', '11', '11', '11', '11', '11', '11', '11', '13', '13', '6', '6', '5', '5', '4', '4', '3', '3', '3', '3', '3', '10', '10', '9', '9', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '8', '8', '3', '3', '3', '3', '3', '7', '7', '4', '4', '4', '4', '5', '5', '5', '6', '6', '6', '6', '20', '20', '20', '20', '20', '20', '7', '7', '7', '8', '8', '8', '8', '8', '8', '19', '19', '18', '18', '15', '15', '15', '14', '14', '14', '13', '13', '13', '12', '12', '11', '11', '11', '9', '9', '8', '8', '7', '7', '10', '10']
}
df = pd.DataFrame(data)
df = df.sort_values(by='ID') # This line sorts the column ID by alphabet
df['Color'] = pd.Categorical(df['Color'], colorcode)
df = pd.DataFrame(data)
print(df)
My problem is that I cannot figure out how to filter the columns (first Country, then Team) and then count up according to the color, starting from 1 for red and not start at 1 again for the next team as long as I am still in the same country.
Convert the color column to ordered categorical type then group the dataframe by the combination of Country
, Team
and Color
then use ngroup
to assign ordered group numbers to each unique combination. Then for each Country
rank the group numbers using dense
method
df['Color'] = pd.Categorical(df['Color'], colorcode, ordered=True)
df['Rank'] = df.groupby(['Country', 'Team', 'Color'], sort=True, dropna=False).ngroup()
df['Rank'] = df.groupby(['Country'])['Rank'].rank(method='dense').astype(int)
Result
ID Country Team Color Result Rank
139 12318718-001 Germany Team1 red 1 1
138 12318718-999 Germany Team1 red 1 1
144 12318714-999 Germany Team1 green 2 2
145 12318714-001 Germany Team1 green 2 2
129 12318725-999 Germany Team1 blue 3 3
130 12318725-004 Germany Team1 blue 3 3
131 12318725-003 Germany Team1 blue 3 3
132 12318725-002 Germany Team1 blue 3 3
133 12318725-001 Germany Team1 blue 3 3
127 12318724-999 Germany Team1 yellow 4 4
128 12318724-001 Germany Team1 yellow 4 4
126 12318716-001 Germany Team1 white 5 5
125 12318716-999 Germany Team1 white 5 5
123 12318721-999 Germany Team1 grey 6 6
124 12318721-001 Germany Team1 grey 6 6
156 12318727-999 Germany Team1 brown 7 7
157 12318727-001 Germany Team1 brown 7 7
149 12318713-999 Germany Team1 violet 8 8
150 12318713-001 Germany Team1 violet 8 8
136 12318715-999 Germany Team1 turquoise 9 9
137 12318715-001 Germany Team1 turquoise 9 9
134 12318726-999 Germany Team1 black 10 10
135 12318726-001 Germany Team1 black 10 10
115 12319319-006 Germany Team2 red 11 11
113 12319319-008 Germany Team2 red 11 11
112 12319319-999 Germany Team2 red 11 11
120 12319319-001 Germany Team2 red 11 11
119 12319319-002 Germany Team2 red 11 11
118 12319319-003 Germany Team2 red 11 11
117 12319319-004 Germany Team2 red 11 11
116 12319319-005 Germany Team2 red 11 11
114 12319319-007 Germany Team2 red 11 11
104 12319310-001 Germany Team2 green 12 12
102 12319310-003 Germany Team2 green 12 12
...