pythonpandasdataframemerge

How to merge dataframes over multiple columns and split rows?


I have two datafames:

df1 = pd.DataFrame({
    'from': [0, 2, 8, 26, 35, 46],
    'to': [2, 8, 26, 35, 46, 48],
    'int': [2, 6, 18, 9, 11, 2]})

df2 = pd.DataFrame({
    'from': [0, 2, 8, 17, 34],
    'to': [2, 8, 17, 34, 49],
    'int': [2, 6, 9, 17, 15]})

I want to create a new dataframe that looks like this:

df = pd.DataFrame({
    'from': [0, 2, 8, 17, 26, 34, 35, 46, 48],
    'to': [2, 8, 17, 26, 34, 35, 46, 48, 49],
    'int': [2, 6, 9, 9, 8, 1, 11, 2, 1]})

I have tried standard merging scripts but have not been able to split the rows containing higher 'from' and 'to' numbers in either df1 or df2 into smaller ones.

How can I merge my dataframes over multiple columns and split rows?


Solution

  • Frirst, combine all unique from and to values from both df1 and df2 to create a set of breakpoints:

    breakpoints = set(df1['from']).union(df1['to']).union(df2['from']).union(df2['to'])
    breakpoints = sorted(breakpoints)
    

    In the example, this is [0, 2, 8, 17, 26, 34, 35, 46, 48, 49]. Now, create a new dataframe with these from and to values, then compute the intervals:

    new_df = pd.DataFrame({'from': breakpoints[:-1], 'to': breakpoints[1:]})
    new_df['int'] = new_df['to'] - new_df['from']
    

    Result:

      from  to  int
    0    0   2    2
    1    2   8    6
    2    8  17    9
    3   17  26    9
    4   26  34    8
    5   34  35    1
    6   35  46   11
    7   46  48    2
    8   48  49    1