pandasdataframelistmatching

Compare if a value from one dataframe is in the list of values of another


Consider these two dfs:

df1 = pd.DataFrame({
    'Id': {0: 101, 1: 102, 2: 103, 3: 104},
    'Number': {0: 'A1', 1: 'A2', 2: 'B1', 3: 'B1'}})

    Id  Number
0   101 A1
1   102 A2
2   103 B1
3   104 B1

df2 = pd.DataFrame({
    'Client': {0: 'John', 1: 'Mia', 2: 'Claudia'},
    'Number': {0: ['A1', 'B1'], 1: ['Z4'], 2: ['A2']}})

    Client  Number
0   John    [A1, B1]
1   Mia     [Z4]
2   Claudia [A2]

How do I check if values from df1["Number"] are in df2["Number"] and append all the corresponding ids from df1["Id"]? So the results are like this?

    Client  Number   Ids
0   John    [A1, B1] [101, 103, 104]
1   Mia     [Z4]     NaN
2   Claudia [A2]     [102]

Solution

  • You can use a custom function and indexing:

    def mapper(lst, ref):
        idx = [x for x in lst.strip('[]').split(', ') if x in ref.index]
        return ref.loc[idx].tolist() if idx else None
    
    df2['Ids'] = df2['Number'].apply(mapper, ref=df1.set_index('Number')['Id'])
    

    If your inputs in df2 are lists (not strings), simplify to:

    def mapper(lst, ref):
        idx = [x for x in lst if x in ref.index]
        return ref.loc[idx].tolist() if idx else None
    

    Variant using Index.intersection (suggested by @wjandrea):

    def mapper(lst, ref):
        idx = ref.index.intersection(lst)
        return None if idx.empty else ref.loc[idx].tolist()
    
    df2['Ids'] = df2['Number'].apply(mapper, ref=df1.set_index('Number')['Id'])
    

    Output:

        Client    Number              Ids
    0     John  [A1, B1]  [101, 103, 104]
    1      Mia      [Z4]             None
    2  Claudia      [A2]            [102]