pythonpandas

pandas how to use columns value as pointer to another multindex to look up values


Say I have a dataset with multiindex i1 i2 i3 and columns A B1 .. B_n where A stores the i3 level index of a value I want to look up from another observation. I want to use this value to look up the B2 value. There can be pointers to a value not in the df or multiple pointers to the same value. However I am not getting what I want and suspect merging might not be the best way of doing this.

For example row (1,10,102) should have looked up i3=101's B2_x value here:

            A  B1  B2_x  c1  B2_y  b2_wanted
i1 i2 i3                                     
1  10 101  103   2     1   1   2.0        2.0
      102  101   2     2   2   NaN        1.0
      103  100   1     2   2   1.0        NaN
   11 100  102   1     2   2   NaN        NaN
      101  103   1     2   2   NaN        NaN
   12 101  103   2     1   2   1.0        1.0
      103  101   1     1   2   1.0        1.0
2  10 100  103   1     2   1   NaN        NaN
      101  103   1     1   2   1.0        NaN
      102  101   1     1   1   NaN        1.0

This is the code for my attempt

# generate test df
import pandas as pd
import numpy as np
np.random.seed(2)
range1 = range(1, 4) 
range2 = range(10, 13) 
range3 = range(100, 104)  
multi_index = pd.MultiIndex.from_product([range1, range2, range3], names=['i1', 'i2', 'i3'])

df = pd.DataFrame(index=multi_index, data={'A': np.random.randint(100, 104, size=len(multi_index)),
                                           'B1': np.random.randint(1, 3, size=len(multi_index)),
                                           'B2': np.random.randint(1, 3, size=len(multi_index)),
                                           'c1': np.random.randint(1, 3, size=len(multi_index))})

df = df[df.index.get_level_values("i3")!=df["A"]].head(10).copy()

# attempt to solve starts here

vars_to_lookup=df[["A","B2"]].copy().reset_index().set_index(["i1","i2","A"])
vars_to_lookup.index.names=["i1","i2","i3"]
vars_to_lookup.drop("i3",axis=1,inplace=True)

temp_df=pd.merge(df,vars_to_lookup,left_index=True,right_index=True,how="left")
temp_df["b2_wanted"]= pd.Series([2,1,np.nan,np.nan,np.nan,1,1,np.nan,np.nan,1 ],index=df.index)

Solution

  • Here's one approach:

    df['B2_wanted'] = (
        df.reindex(
            df.droplevel('i3').set_index('A', append=True).index
        )['B2'].to_numpy()
    )
    

    Output:

                 A  B1  B2  c1  B2_wanted
    i1 i2 i3                             
    1  10 101  103   2   1   1        2.0
          102  101   2   2   2        1.0
          103  100   1   2   2        NaN
       11 100  102   1   2   2        NaN
          101  103   1   2   2        NaN
       12 101  103   2   1   2        1.0
          103  101   1   1   2        1.0
    2  10 100  103   1   2   1        NaN
          101  103   1   1   2        NaN
          102  101   1   1   1        1.0
    

    Explanation