I have two dfs,
df1
ARHGEF10L HIF3A RNF17 RNF10 RNF11
NCBP1 NaN -0.432931 NaN -0.233554 0.165081
NCBP2 0.184332 -0.077655 0.331873 -0.449421 0.153836
RPL37 NaN NaN 0.192629 NaN -0.089123
DHX9 -0.115242 -0.133209 -0.207657 -0.267636 0.363868
TCOF1 NaN NaN 0.084838 0.140575 -0.122832
df2:
ARHGEF10L HIF3A RNF17 RNF10 RNF11
NCBP1 NaN 0.000067 NaN 0.038310 NaN
NCBP2 NaN NaN 0.002809 0.000033 NaN
RPL37 NaN NaN NaN NaN NaN
DHX9 NaN NaN NaN 0.017100 0.000979
TCOF1 NaN NaN NaN NaN NaN
Now I want to create a new df that colude 4 columns: gene1(row index of both df1 and df2), gene2(column index of both df1 and df2), value1(df1 value) and value2(df2 value)
So it would be like 25*4
gene1 gene2 value1 value2
NCBP1 ARHGEF10L NaN Nan
NCBP1 HIF3A -0.432931 0.000067
NCBP1 RNF17 NaN NaN
NCBP1 RNF10 -0.233554 0.038310
NCBP1 RNF11 0.165081 NaN
NCBP2 ARHGEF10L 0.184332 Nan
NCBP2 HIF3A -0.077655 NaN
NCBP2 RNF17 0.331873 0.002809
....
TCOF1 ARHGEF10L NaN NaN
TCOF1 HIF3A NaN NaN
...
I have this fake code but not sure for the middle part
def coef_fdr_table(df1,df2):
column_names = ['gene1','gene2', 'value1', 'value2']
df = pd.DataFrame(columns = column_names)
for i in range(25):
df.iloc[i,2] = df1[...]
df.iloc[i,3] = df2[...]
df.set_index(['gene1'],inplace = True)
return(df)
Any suggestion is appreciated!
You can just concat
the two dataframes with a MultiIndex and reshape:
(pd.concat({'value1': df1,
'value2': df2,
}, axis=1)
.stack(1, dropna=False) # value identifiers to rows
.rename_axis(['gene1', 'gene2']) # set future column names
.reset_index() # index to columns
)
output:
gene1 gene2 value1 value2
0 NCBP1 ARHGEF10L NaN NaN
1 NCBP1 HIF3A -0.432931 0.000067
2 NCBP1 RNF10 -0.233554 0.038310
3 NCBP1 RNF11 0.165081 NaN
4 NCBP1 RNF17 NaN NaN
5 NCBP2 ARHGEF10L 0.184332 NaN
...