I have a data frame [df] that looks like this but much lagger:
title of the novel author publishing year mentionned cities
0 Beasts and creatures Bruno Ivory 2021 New York
0 Monsters Renata Mcniar 2023 New York
0 At risk Charles Dobi 2020 London
0 Manuela and Ricardo Lucas Zacci 2022 Rio de Janeiro
0 War against the machine Angelina Trotter 1999 Rio de Janeiro
I would like to add another column with the objective of counting all the occurences of the cities. The problem is that I want to maintain the year of that occurrence, as I work with history. In other words, it is important for me to be able to know when the city was mentionned.
The expected outcome would look like this:
title of the novel author publishing year mentionned cities Counter
0 Beasts and creatures Bruno Ivory 2021 New York 1
0 Monsters Renata Mcniar 2022 New York 2
0 At risk Charles Dobi 2020 London 1
0 Manuela and Ricardo Lucas Zacci 2022 Rio de Janeiro 1
0 War against the machine Angelina Trotter 1999 Rio de Janeiro 2
So far, I have just managed to count all the occurrences, but I could not relate it to the publishing years. The code I am using is:
df ['New York'] = df.eq('New York').sum().to_frame().T
Can someone help me, please?
edit:
I tried joining two dataframes and I got something interesting but not what I really wanted. The problem is that it does not keep the Publishing year on track.
d[f'counter'] = d.fgroupby('mentionned cities')['mentionned cities'].transform('counter')
result = pd.concat([df['New York'], df], axis=1, join='inner')
display(result)
Output:
title of the novel author publishing year mentionned cities Counter
0 Beasts and creatures Bruno Ivory 2021 New York 2
0 Monsters Renata Mcniar 2023 New York 2
0 At risk Charles Dobi 2020 London 1
0 Manuela and Ricardo Lucas Zacci 2022 Rio de Janeiro 1
0 War against the machine Angelina Trotter 1999 Rio de Janeiro 2
The problem still lingers on
Perhaps you could use a for loop to iterate through the 'mentioned cities' column, and use a dict to count the occurrences of cities:
city_count = {}
count_column = []
for city in df['mentioned cities']:
city_count[city] = city_count.get(city, 0) + 1
count_column.append(city_count[city])
df['Counter'] = count_column