[SOLVED] Assign a number for every matching value in list

Assign a number for every matching value in list

I have a long list of items that I want to assign a number to that increases by one every time the value in the list changes. Basically I want to categorize the values in the list.

It can be assumed that the values in the list are always lumped together, but I don't know the number of instances it's repeating. The list is stored in a dataframe as of now, but the output needs to be a dataframe. Example:

my_list = ['Apple', 'Apple', 'Orange', 'Orange','Orange','Banana']
grouping = pd.DataFrame(my_list, columns=['List'])

Expected output:

     List  Value
0   Apple      1
1   Apple      1
2  Orange      2
3  Orange      2
4  Orange      2
5  Banana      3

I have tried with a for loop, where it checks if the previous value is the same as the current value, but I imagine that there should be a nicer way of doing this.

Solution

Use pandas.factorize, and add 1 if you need the category numbers to start with 1 instead of 0:

import pandas as pd

my_list = ['Apple', 'Apple', 'Orange', 'Orange','Orange','Banana']
grouping = pd.DataFrame(my_list, columns=['List'])

grouping['code'] = pd.factorize(grouping['List'])[0] + 1
print(grouping)

Output:

     List  code
0   Apple     1
1   Apple     1
2  Orange     2
3  Orange     2
4  Orange     2
5  Banana     3