pythonnumpysortinglabeling

Label a list following the unique elements appearing in it


Given a list of strings such as:

foo = \['A', 'A', 'B', 'A', 'B', 'C', 'C', 'A', 'B', 'C', 'A'\]

How can we label them such that the output would be:

output = \['A1', 'A2', 'B1', 'A3', 'B2', 'C1', 'C2', 'A4', 'B2', 'C3', 'A5'\] 

(keeping the order of the original list)

In the following case there are only 3 unique variables to look at, so the first think I tried was looking at the unique elements:

import numpy as np

np.unique(foo)

Output = \['A', 'B', 'C'\]

But then I get stacked when trying to find the proper loop to reach the desired output.


Solution

  • Using pure python, take advantage of a dictionary to count the values:

    foo = ['A', 'A', 'B', 'A', 'B', 'C', 'C', 'A', 'B', 'C', 'A']
    
    d = {}
    out = []
    for val in foo:
        d[val] = d.get(val, 0)+1
        out.append(f'{val}{d[val]}')
    

    If you can use :

    import pandas as pd
    
    s = pd.Series(foo)
    out = s.add(s.groupby(s).cumcount().add(1).astype(str)).tolist()
    

    Output: ['A1', 'A2', 'B1', 'A3', 'B2', 'C1', 'C2', 'A4', 'B3', 'C3', 'A5']