I have 5 sheets in excel with different parameters.
history
idx history
1 1daybehind
recorded
idx recorded
1 daily
optmethod
idx opt optmethod
1 backprop x1
2 convex x2
3 monte x3
4 monte x4
optpara
idx optpara
1 x1x2
2 x3x4
3 x1x4
4 x2x3
filter
idx filter
1 x1>0
2 x2>0
3 x3>0
4 x4>0
I want to create a permutation of the row entries so I want to end up with the following sheet with all possible scenarios. This is just the first 6 rows.
scenario history recorded optmethod optpara filter
1 1 1 1 1 1
2 1 1 1 1 2
3 1 1 1 1 3
4 1 1 1 1 4
5 1 1 1 2 1
6 1 1 1 2 2
...
So the first row, scenario 1
will be 1
1daybehind, 1
daily, 1
backprop, 1
x1x2, 1
x1>1
I tried the following code,
for name,sheet in sheet_dict.items():
print(name)
if name == 'history':
sndf = sheet
sndf = sndf[['idx']]
sndf = sndf.rename(columns={'idx':name})
else:
sndf['key'] = 1
sheet = sheet[['idx']]
sheet = sheet.rename(columns={'idx':name})
sheet['key'] = 1
sndf = pd.merge(sndf, sheet, on ='key').drop("key", 1)
sndf.index.names = ['scenario']
sndf.to_csv('scenarionum.csv',index=True)
But I end up with this. I have the correct number of rows but each cell is just filled with 1
s.
scenario history recorded optmethod optpara filter
0 1 1 1 1 1
1 1 1 1 1 1
2 1 1 1 1 1
3 1 1 1 1 1
4 1 1 1 2 1
5 1 1 1 2 1
I believe the answer to this problem is a cross join but I'm not sure how I can go about doing it.
What am I doing wrong and how do i fix it???
If idx
is the index of your dataframes:
indexes = [ data.index.tolist() for data in sheet_dict.values()]
else if idx
is a simple column of your dataframes:
indexes = [ data["idx"].tolist() for data in sheet_dict.values()]
Generate all combinations
sndf = pd.MultiIndex.from_product(indexes, names=sheet_dict.keys()) \
.to_frame(index=False)
.rename_axis("scenario")
sndf.index += 1
>>> sndf
history recorded optmethod optpara filter
scenario
1 1 1 1 1 1
2 1 1 1 1 2
3 1 1 1 1 3
4 1 1 1 1 4
5 1 1 1 2 1
... ... ... ... ... ...
60 1 1 4 3 4
61 1 1 4 4 1
62 1 1 4 4 2
63 1 1 4 4 3
64 1 1 4 4 4
[64 rows x 5 columns]
Update: alternative methods
With cartesian_product
from pandas.core.reshape.util
from pandas.core.reshape.util import cartesian_product
sndf = pd.DataFrame(list(zip(*cartesian_product(indexes))),
columns=sheet_dict.keys()).rename_axis("scenario")
sndf.index += 1
With product
from itertools
from itertools import product
sndf = pd.DataFrame(product(*indexes),
columns=sheet_dict.keys()).rename_axis("scenario")
sndf.index += 1