I have 2 dataframes
df1:
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000
df2:
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort
AB11,RU15,P1,AJ11,RU25,P2
AB12,RU18,P2,AB11,RU35,P2
AB13,RU19,P3,AB11,RU40,P4
I want to interleave the two dataframes and I've tried a few options but can't seem to get it to work. I'm close to the functionality with the below code but it doesn't join on the appropriate columns
import pandas as pd
from toolz import interleave
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1)
Expected Output would look like
StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
AB11,RU15,P1,AJ11,RU25,P2,nan,nan
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
AB12,RU18,P2,AB11,RU35,P2,nan,nan
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000
AB13,RU19,P3,AB11,RU40,P4,nan,nan
I think it should be pretty simple but I can't find the appropriate syntax. Can anyone give any ideas?
Thanks in advance for the help!
If columns names are same, only difference is some new columns names in one of DataFrame is possible use:
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 AB11 RU15 P1 AJ11 RU25 P2 None
2 DD2 Switch2 P3 AD2 Switch3 P2 MTP
3 AB12 RU18 P2 AB11 RU35 P2 None
4 DD3 Switch3 P5 AD3 Switch4 P6 MTP
5 AB13 RU19 P3 AB11 RU40 P4 None
Speed
0 1000.0
1 NaN
2 1000.0
3 NaN
4 1000.0
5 NaN
More general solution working for any columns names is use DataFrame.align
before for prevent correct align columns for each DataFrame:
print (df1)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 Switch2 AD1 P2 MTP 1000 Switch1 DD1
1 Switch3 AD2 P2 MTP 1000 Switch2 DD2
2 Switch4 AD3 P6 MTP 1000 Switch3 DD3
StartPort
0 P1
1 P3
2 P5
print (df2)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 RU25 AJ11 P2 NaN NaN RU15 AB11
1 RU35 AB11 P2 NaN NaN RU18 AB12
2 RU40 AB11 P4 NaN NaN RU19 AB13
StartPort
0 P1
1 P2
2 P3
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 Switch2 AD1 P2 MTP 1000.0 Switch1 DD1
1 RU25 AJ11 P2 NaN NaN RU15 AB11
2 Switch3 AD2 P2 MTP 1000.0 Switch2 DD2
3 RU35 AB11 P2 NaN NaN RU18 AB12
4 Switch4 AD3 P6 MTP 1000.0 Switch3 DD3
5 RU40 AB11 P4 NaN NaN RU19 AB13
StartPort
0 P1
1 P1
2 P3
3 P2
4 P5
5 P3
Another idea with Index.union
and DataFrame.reindex
:
cols = df1.columns.union(df2.columns, sort=False)
df1 = df1.reindex(cols, axis=1)
df2 = df2.reindex(cols, axis=1)
print (df1)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 DD2 Switch2 P3 AD2 Switch3 P2 MTP
2 DD3 Switch3 P5 AD3 Switch4 P6 MTP
Speed
0 1000
1 1000
2 1000
print (df2)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 AB11 RU15 P1 AJ11 RU25 P2 NaN
1 AB12 RU18 P2 AB11 RU35 P2 NaN
2 AB13 RU19 P3 AB11 RU40 P4 NaN
Speed
0 NaN
1 NaN
2 NaN
df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=cols)
print (df3)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 AB11 RU15 P1 AJ11 RU25 P2 NaN
2 DD2 Switch2 P3 AD2 Switch3 P2 MTP
3 AB12 RU18 P2 AB11 RU35 P2 NaN
4 DD3 Switch3 P5 AD3 Switch4 P6 MTP
5 AB13 RU19 P3 AB11 RU40 P4 NaN
Speed
0 1000.0
1 NaN
2 1000.0
3 NaN
4 1000.0
5 NaN