I have two datafames:
df1 = pd.DataFrame({
'from': [0, 2, 8, 26, 35, 46],
'to': [2, 8, 26, 35, 46, 48],
'int': [2, 6, 18, 9, 11, 2]})
df2 = pd.DataFrame({
'from': [0, 2, 8, 17, 34],
'to': [2, 8, 17, 34, 49],
'int': [2, 6, 9, 17, 15]})
I want to create a new dataframe that looks like this:
df = pd.DataFrame({
'from': [0, 2, 8, 17, 26, 34, 35, 46, 48],
'to': [2, 8, 17, 26, 34, 35, 46, 48, 49],
'int': [2, 6, 9, 9, 8, 1, 11, 2, 1]})
I have tried standard merging scripts but have not been able to split the rows containing higher 'from' and 'to' numbers in either df1
or df2
into smaller ones.
How can I merge my dataframes over multiple columns and split rows?
Frirst, combine all unique from
and to
values from both df1
and df2
to create a set of breakpoints:
breakpoints = set(df1['from']).union(df1['to']).union(df2['from']).union(df2['to'])
breakpoints = sorted(breakpoints)
In the example, this is [0, 2, 8, 17, 26, 34, 35, 46, 48, 49]
. Now, create a new dataframe with these from
and to
values, then compute the intervals:
new_df = pd.DataFrame({'from': breakpoints[:-1], 'to': breakpoints[1:]})
new_df['int'] = new_df['to'] - new_df['from']
Result:
from to int
0 0 2 2
1 2 8 6
2 8 17 9
3 17 26 9
4 26 34 8
5 34 35 1
6 35 46 11
7 46 48 2
8 48 49 1