pythonpandasclosest

Find closest element in list for each row in Pandas DataFrame column


I have a Pandas DataFrame and comparation list like this:

In [21]: df
Out[21]: 
   Results
0       90
1       80
2       70
3       60
4       50
5       40
6       30
7       20
8       10

In [23]: comparation_list
Out[23]: [83, 72, 65, 40, 36, 22, 15, 12]

Now, I want to create a new column on this df where the value of each row is the closest element of the comparation list to the Results column correspondent row.

The output should be something like this:

   Results   assigned_value
0       90               83
1       80               83
2       70               72
3       60               65
4       50               40
5       40               40
6       30               36
7       20               22
8       10               12

Doing this through loops or using apply comes straight to my mind, but I would like to know how to do it in a vectorized way.


Solution

  • Use a merge_asof:

    out = pd.merge_asof(
        df.reset_index().sort_values(by='Results'),
        pd.Series(sorted(comparation_list), name='assigned_value'),
        left_on='Results', right_on='assigned_value',
        direction='nearest'
    ).set_index('index').sort_index()
    

    Output:

           Results  assigned_value
    index                         
    0           90              83
    1           80              83
    2           70              72
    3           60              65
    4           50              40
    5           40              40
    6           30              36
    7           20              22
    8           10              12