I get that this will create a dataframe of a single sample:
samples = np.random.normal(loc=df_avgs['AVERAGE'][region], scale=df_avgs['STDEV'][region], size=1)
But I want to create a sample for each row, based on a condition. For instance, I have a df of means, stdev and a df of conditions.
df_avgs
REGION | AVERAGE | STDEV |
---|---|---|
0 | -1.61 | 7.75 |
1 | 2.87 | 8.38 |
2 | 3.61 | 7.61 |
3 | -10.26 | 9.19 |
df_conditions
ID | REGION_NAME |
---|---|
0 | Region 0 |
1 | Region 3 |
2 | Region 2 |
3 | Region 1 |
4 | Region 1 |
5 | Region 2 |
6 | Region 3 |
How do I create a df of length(df_conditions) or just add a column to df_conditions, with samples based on the region?
IIUC, you can merge the two dataframes together and then, assign the values using list comprehension with a zip of two dataframe columns:
df_zip = df_conditions.assign(REGION=df_conditions['REGION_NAME'].str.extract('([0-9])').astype(int)).merge(df_avgs)
df_conditions['SAMPLES'] = [np.random.normal(loc=l, scale=s, size=1)[0] for l, s in zip(df_zip['AVERAGE'], df_zip['STDEV'])]
print(df_conditions)
Output:
ID REGION_NAME SAMPLES
0 0 Region 0 -2.475624
1 1 Region 3 -7.157439
2 2 Region 2 -4.563650
3 3 Region 1 -2.199240
4 4 Region 1 5.221416
5 5 Region 2 7.175620
6 6 Region 3 -22.775366