pandaspermutationscenarios

Use python pandas rows to create permutations to create all possible scenarios


I have 5 sheets in excel with different parameters.

history

idx  history
1    1daybehind

recorded

idx   recorded
1     daily

optmethod

idx   opt         optmethod
1     backprop    x1
2     convex      x2
3     monte       x3
4     monte       x4 

optpara

idx   optpara   
1     x1x2    
2     x3x4    
3     x1x4  
4     x2x3  

filter

idx   filter   
1     x1>0    
2     x2>0    
3     x3>0  
4     x4>0  

I want to create a permutation of the row entries so I want to end up with the following sheet with all possible scenarios. This is just the first 6 rows.

scenario history recorded optmethod optpara filter
1        1       1        1         1        1
2        1       1        1         1        2
3        1       1        1         1        3
4        1       1        1         1        4
5        1       1        1         2        1
6        1       1        1         2        2
...

So the first row, scenario 1 will be 1 1daybehind, 1 daily, 1 backprop, 1 x1x2, 1 x1>1

I tried the following code,

for name,sheet in sheet_dict.items():
    print(name)
    if name == 'history':
        sndf = sheet
        sndf = sndf[['idx']]
        sndf = sndf.rename(columns={'idx':name})
    else: 
        sndf['key'] = 1
        sheet = sheet[['idx']]
        sheet = sheet.rename(columns={'idx':name})
        sheet['key'] = 1
        sndf = pd.merge(sndf, sheet, on ='key').drop("key", 1)
sndf.index.names = ['scenario']
sndf.to_csv('scenarionum.csv',index=True)

But I end up with this. I have the correct number of rows but each cell is just filled with 1s.

scenario history recorded optmethod optpara filter
0        1       1        1         1        1
1        1       1        1         1        1
2        1       1        1         1        1
3        1       1        1         1        1
4        1       1        1         2        1
5        1       1        1         2        1

I believe the answer to this problem is a cross join but I'm not sure how I can go about doing it.

What am I doing wrong and how do i fix it???


Solution

  • If idx is the index of your dataframes:

    indexes = [ data.index.tolist() for data in sheet_dict.values()]
    

    else if idx is a simple column of your dataframes:

    indexes = [ data["idx"].tolist() for data in sheet_dict.values()]
    

    Generate all combinations

    sndf = pd.MultiIndex.from_product(indexes, names=sheet_dict.keys()) \
                        .to_frame(index=False)
                        .rename_axis("scenario")
    sndf.index += 1
    
    >>> sndf
              history  recorded  optmethod  optpara  filter
    scenario
    1               1         1          1        1       1
    2               1         1          1        1       2
    3               1         1          1        1       3
    4               1         1          1        1       4
    5               1         1          1        2       1
    ...           ...       ...        ...      ...     ...
    60              1         1          4        3       4
    61              1         1          4        4       1
    62              1         1          4        4       2
    63              1         1          4        4       3
    64              1         1          4        4       4
    
    [64 rows x 5 columns]
    

    Update: alternative methods

    With cartesian_product from pandas.core.reshape.util

    from pandas.core.reshape.util import cartesian_product
    
    sndf = pd.DataFrame(list(zip(*cartesian_product(indexes))),
                        columns=sheet_dict.keys()).rename_axis("scenario")
    sndf.index += 1
    

    With product from itertools

    from itertools import product
    
    sndf = pd.DataFrame(product(*indexes),
                        columns=sheet_dict.keys()).rename_axis("scenario")
    sndf.index += 1