[SOLVED] How to merge dataframes over multiple columns and split rows?

How to merge dataframes over multiple columns and split rows?

I have two datafames:

df1 = pd.DataFrame({
    'from': [0, 2, 8, 26, 35, 46],
    'to': [2, 8, 26, 35, 46, 48],
    'int': [2, 6, 18, 9, 11, 2]})

df2 = pd.DataFrame({
    'from': [0, 2, 8, 17, 34],
    'to': [2, 8, 17, 34, 49],
    'int': [2, 6, 9, 17, 15]})

I want to create a new dataframe that looks like this:

df = pd.DataFrame({
    'from': [0, 2, 8, 17, 26, 34, 35, 46, 48],
    'to': [2, 8, 17, 26, 34, 35, 46, 48, 49],
    'int': [2, 6, 9, 9, 8, 1, 11, 2, 1]})

I have tried standard merging scripts but have not been able to split the rows containing higher 'from' and 'to' numbers in either df1 or df2 into smaller ones.

How can I merge my dataframes over multiple columns and split rows?

Solution

Frirst, combine all unique from and to values from both df1 and df2 to create a set of breakpoints:

breakpoints = set(df1['from']).union(df1['to']).union(df2['from']).union(df2['to'])
breakpoints = sorted(breakpoints)

In the example, this is [0, 2, 8, 17, 26, 34, 35, 46, 48, 49]. Now, create a new dataframe with these from and to values, then compute the intervals:

new_df = pd.DataFrame({'from': breakpoints[:-1], 'to': breakpoints[1:]})
new_df['int'] = new_df['to'] - new_df['from']

Result:

  from  to  int
0    0   2    2
1    2   8    6
2    8  17    9
3   17  26    9
4   26  34    8
5   34  35    1
6   35  46   11
7   46  48    2
8   48  49    1