pythonpandas

Pandas: how to map values in a column of lists?


A column in a pandas dataframe contains lists of values.

Using a dictionary, I would like to create a new column with mapped values using the dictionary, and for any values not in the dictionary, those values are removed.

Here is a minimal example:

Set up of the dataframe

df = pd.DataFrame(data={ 'B': ['x', 'y', 'z']})
df.at[0, 'B'] = ['jki', 'gg4', 'k6k']
df.at[1, 'B'] = ['2f4', 'gg4', 'g24']
df.at[2, 'B'] = ['1k1', 'g24', '1k1', '2f4']

Results in

df

      B
0   [jki, gg4, k6k]
1   [2f4, gg4, g24]
2   [1k1, g24, 1k1, 2f4]

Set up of the dictionary

conv = { 'jki': 1, 'gg4': 2, '2f4': 3 , 'g24':4, }

If the column was not a list, this code would be used

df['MappedA'] = df.B.map(conv)

But since the column contains lists, that code can not be used

Here's what I would like the result to be

                      B                  MappedA
0   [jki, gg4, k6k]                [ 1 ,  2 ]
1   [2f4, gg4, g24]              [3, 2, 4]
2   [1k1, g24, 1k1, 2f4]        [ 4 , 3  ]

Solution

  • Using a nested list comprehension and dictionary lookup:

    df.assign(mapped=[[conv[k] for k in row if conv.get(k)] for row in df.B])
    

                          B     mapped
    0       [jki, gg4, k6k]     [1, 2]
    1       [2f4, gg4, g24]  [3, 2, 4]
    2  [1k1, g24, 1k1, 2f4]     [4, 3]