I have a long list of items that I want to assign a number to that increases by one every time the value in the list changes. Basically I want to categorize the values in the list.
It can be assumed that the values in the list are always lumped together, but I don't know the number of instances it's repeating. The list is stored in a dataframe as of now, but the output needs to be a dataframe. Example:
my_list = ['Apple', 'Apple', 'Orange', 'Orange','Orange','Banana']
grouping = pd.DataFrame(my_list, columns=['List'])
Expected output:
List Value
0 Apple 1
1 Apple 1
2 Orange 2
3 Orange 2
4 Orange 2
5 Banana 3
I have tried with a for
loop, where it checks if the previous value is the same as the current value, but I imagine that there should be a nicer way of doing this.
Use pandas.factorize
, and add 1
if you need the category numbers to start with 1
instead of 0
:
import pandas as pd
my_list = ['Apple', 'Apple', 'Orange', 'Orange','Orange','Banana']
grouping = pd.DataFrame(my_list, columns=['List'])
grouping['code'] = pd.factorize(grouping['List'])[0] + 1
print(grouping)
Output:
List code
0 Apple 1
1 Apple 1
2 Orange 2
3 Orange 2
4 Orange 2
5 Banana 3