Say I have a dataset with multiindex i1 i2 i3 and columns A B1 .. B_n where A stores the i3 level index of a value I want to look up from another observation. I want to use this value to look up the B2 value. There can be pointers to a value not in the df or multiple pointers to the same value. However I am not getting what I want and suspect merging might not be the best way of doing this.
For example row (1,10,102) should have looked up i3=101's B2_x value here:
A B1 B2_x c1 B2_y b2_wanted
i1 i2 i3
1 10 101 103 2 1 1 2.0 2.0
102 101 2 2 2 NaN 1.0
103 100 1 2 2 1.0 NaN
11 100 102 1 2 2 NaN NaN
101 103 1 2 2 NaN NaN
12 101 103 2 1 2 1.0 1.0
103 101 1 1 2 1.0 1.0
2 10 100 103 1 2 1 NaN NaN
101 103 1 1 2 1.0 NaN
102 101 1 1 1 NaN 1.0
This is the code for my attempt
# generate test df
import pandas as pd
import numpy as np
np.random.seed(2)
range1 = range(1, 4)
range2 = range(10, 13)
range3 = range(100, 104)
multi_index = pd.MultiIndex.from_product([range1, range2, range3], names=['i1', 'i2', 'i3'])
df = pd.DataFrame(index=multi_index, data={'A': np.random.randint(100, 104, size=len(multi_index)),
'B1': np.random.randint(1, 3, size=len(multi_index)),
'B2': np.random.randint(1, 3, size=len(multi_index)),
'c1': np.random.randint(1, 3, size=len(multi_index))})
df = df[df.index.get_level_values("i3")!=df["A"]].head(10).copy()
# attempt to solve starts here
vars_to_lookup=df[["A","B2"]].copy().reset_index().set_index(["i1","i2","A"])
vars_to_lookup.index.names=["i1","i2","i3"]
vars_to_lookup.drop("i3",axis=1,inplace=True)
temp_df=pd.merge(df,vars_to_lookup,left_index=True,right_index=True,how="left")
temp_df["b2_wanted"]= pd.Series([2,1,np.nan,np.nan,np.nan,1,1,np.nan,np.nan,1 ],index=df.index)
Here's one approach:
df['B2_wanted'] = (
df.reindex(
df.droplevel('i3').set_index('A', append=True).index
)['B2'].to_numpy()
)
Output:
A B1 B2 c1 B2_wanted
i1 i2 i3
1 10 101 103 2 1 1 2.0
102 101 2 2 2 1.0
103 100 1 2 2 NaN
11 100 102 1 2 2 NaN
101 103 1 2 2 NaN
12 101 103 2 1 2 1.0
103 101 1 1 2 1.0
2 10 100 103 1 2 1 NaN
101 103 1 1 2 NaN
102 101 1 1 1 1.0
Explanation
df.droplevel
+ df.set_index
to exchange level 'i3' for column 'A' and select df.index
.df.reindex
and assign 'B2' values via Series.to_numpy
.