pythonpandaslistcategorical-data

Assign a number for every matching value in list


I have a long list of items that I want to assign a number to that increases by one every time the value in the list changes. Basically I want to categorize the values in the list.

It can be assumed that the values in the list are always lumped together, but I don't know the number of instances it's repeating. The list is stored in a dataframe as of now, but the output needs to be a dataframe. Example:

my_list = ['Apple', 'Apple', 'Orange', 'Orange','Orange','Banana']
grouping = pd.DataFrame(my_list, columns=['List'])

Expected output:

     List  Value
0   Apple      1
1   Apple      1
2  Orange      2
3  Orange      2
4  Orange      2
5  Banana      3

I have tried with a for loop, where it checks if the previous value is the same as the current value, but I imagine that there should be a nicer way of doing this.


Solution

  • Use pandas.factorize, and add 1 if you need the category numbers to start with 1 instead of 0:

    import pandas as pd
    
    my_list = ['Apple', 'Apple', 'Orange', 'Orange','Orange','Banana']
    grouping = pd.DataFrame(my_list, columns=['List'])
    
    grouping['code'] = pd.factorize(grouping['List'])[0] + 1
    print(grouping)
    

    Output:

         List  code
    0   Apple     1
    1   Apple     1
    2  Orange     2
    3  Orange     2
    4  Orange     2
    5  Banana     3