pythondataframescenarios

Generate szenarios with differnet means from data frame


I have the following data frame:

           Cluster  OPS(4)  mean(ln)  std(ln)
0           5-894  5-894a     2.203    0.775
1           5-894  5-894b     2.203    0.775
2           5-894  5-894c     2.203    0.775
3           5-894  5-894d     2.203    0.775
4           5-894  5-894e     2.203    0.775

For each surgery type (in column OPS(4)) I would like to generate 10.000 scenarios which should be stored in another data frame.

I know, that I can create scenarios with:

 num_reps = 10.000
 scenarios  = np.ceil(np.random.lognormal(mean, std, num_reps))

And the new data frame should look like this whith 10,000 scenarios in each column:

scen_per_surg = pd.DataFrame(index=range(num_reps), columns=merged_information['OPS(4)'])

OPS(4) 5-894a 5-894b 5-894c 5-894d 5-894e 
0         NaN    NaN    NaN    NaN    NaN    
1         NaN    NaN    NaN    NaN    NaN    
2         NaN    NaN    NaN    NaN    NaN    
3         NaN    NaN    NaN    NaN    NaN    
4         NaN    NaN    NaN    NaN    NaN    
5         NaN    NaN    NaN    NaN    NaN    
...

Unfortunately, I don't know how to iterate over the rows of the first data frame to create the scenarios.

Can somebody help me? Best regards


Solution

  • Create some experimenting data

    import pandas as pd
    df = pd.DataFrame(data=[
                              [ '5-894' , '5-894a'  ,   2.0 ,   0.70],
                              [ '5-894' , '5-894b'  ,   2.1 ,   0.71],
                              [ '5-894' , '5-894c'  ,   2.2 ,   0.72],
                              [ '5-894' , '5-894d'  ,   2.3 ,   0.73],
                              [ '5-894' , '5-894e'  ,   2.4 ,   0.74] ], columns =['Cluster', 'OPS(4)', 'mean(ln)', 'std(ln)'])
    print(df)
    

    create an empty dataframe

    new_df = pd.DataFrame()
    

    Define a function that will be applied to each row of the original df and generates the random values required and assign it to a column in new df

    import numpy as np
    def geb_scenarios(row):
      # print(row)
      col, mean, std = row[1:]
      new_df[col] = np.ceil(np.random.lognormal(mean, std, 10))
    

    Apply the function

    df.apply(geb_scenarios, axis=1)
    print(new_df)
    

    enter image description here