pythonpandasmergemulti-index

Pandas Dataframe Multiindex Merge


Here is a hypothetical scenario with multiindex dataframes in pandas. Trying to merge them will result in an error. Do I have to do reset_index() on either dataframe to make this work?

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])

Attempted merges:

s1.merge(s2, how='left', left_index=True, right_index=True)
s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])

Solution

  • Seems like you need to use a combination of them.

    s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
    #s1.merge(s2, right_index=True, left_on=['first', 'second'])
    

    Output:

                   s1        s2
    bar one  0.765385 -0.365508
        two  1.462860  0.751862
    baz one  0.304163  0.761663
        two -0.816658 -1.810634
    foo one  1.891434  1.450081
        two  0.571294  1.116862
    qux one  1.056516 -0.052927
        two -0.574916 -1.197596