pythonpandasdataframemulti-indexrecord-linkage

Show all matched pairs in a single dataframe - Python Record Linkage


I have a pandas MultiIndex object :

In [0]: index
Out[0]: 
MultiIndex(levels=[[1, 2, 3, 8], [10, 11]],
       labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]])

This MultiIndex object defines the following 8 pairs : (1,10), (1,11), (2,10), (2,11), (3,10), (3,11), (8,10), (8,11).

The elements listed in the levels correspond to the index of a DataFrame:

In [1]: df
Out[1]: 
     col_1   col_2
0        0       1
1        2       3
2        4       5
3        6       7
4        8       9
5       10      11
6       12      13
7       14      15
8       16      17
9       18      19
10      20      21
11      22      23

What I would like is to create a new DataFrame that shows all the pairs defined above. Something that look like:

In [2]: result
Out[2]: 
    col_1   col_2     pair
        2       3        0
       20      21        0
        2       3        1
       22      23        1
        4       5        2
       20      21        2
        4       5        3
       22      23        3
        6       7        4
       20      21        4
        6       7        5
       22      23        5
       16      17        6
       20      21        6
       16      17        7
       22      23        7

Is there any efficient way to implement this? (if possible, without for-loops)

Thanks in advance


Solution

  • Using stack with iloc or reindex

    df.iloc[m.to_frame().stack()].assign(key=m.to_frame().reset_index(drop=True).stack().index.get_level_values(0))
    Out[205]: 
        col_1  col_2  key
    1       2      3    0
    10     20     21    0
    1       2      3    1
    11     22     23    1
    2       4      5    2
    10     20     21    2
    2       4      5    3
    11     22     23    3
    3       6      7    4
    10     20     21    4
    3       6      7    5
    11     22     23    5
    8      16     17    6
    10     20     21    6
    8      16     17    7
    11     22     23    7