I have the following data frame:
Cluster OPS(4) mean(ln) std(ln)
0 5-894 5-894a 2.203 0.775
1 5-894 5-894b 2.203 0.775
2 5-894 5-894c 2.203 0.775
3 5-894 5-894d 2.203 0.775
4 5-894 5-894e 2.203 0.775
For each surgery type (in column OPS(4)) I would like to generate 10.000 scenarios which should be stored in another data frame.
I know, that I can create scenarios with:
num_reps = 10.000
scenarios = np.ceil(np.random.lognormal(mean, std, num_reps))
And the new data frame should look like this whith 10,000 scenarios in each column:
scen_per_surg = pd.DataFrame(index=range(num_reps), columns=merged_information['OPS(4)'])
OPS(4) 5-894a 5-894b 5-894c 5-894d 5-894e
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
...
Unfortunately, I don't know how to iterate over the rows of the first data frame to create the scenarios.
Can somebody help me? Best regards
Create some experimenting data
import pandas as pd
df = pd.DataFrame(data=[
[ '5-894' , '5-894a' , 2.0 , 0.70],
[ '5-894' , '5-894b' , 2.1 , 0.71],
[ '5-894' , '5-894c' , 2.2 , 0.72],
[ '5-894' , '5-894d' , 2.3 , 0.73],
[ '5-894' , '5-894e' , 2.4 , 0.74] ], columns =['Cluster', 'OPS(4)', 'mean(ln)', 'std(ln)'])
print(df)
create an empty dataframe
new_df = pd.DataFrame()
Define a function that will be applied to each row of the original df and generates the random values required and assign it to a column in new df
import numpy as np
def geb_scenarios(row):
# print(row)
col, mean, std = row[1:]
new_df[col] = np.ceil(np.random.lognormal(mean, std, 10))
Apply the function
df.apply(geb_scenarios, axis=1)
print(new_df)