[SOLVED] Create a Pandas dataframe of normal estimates based on varying row requirements

Create a Pandas dataframe of normal estimates based on varying row requirements

I get that this will create a dataframe of a single sample:

samples = np.random.normal(loc=df_avgs['AVERAGE'][region], scale=df_avgs['STDEV'][region], size=1)

But I want to create a sample for each row, based on a condition. For instance, I have a df of means, stdev and a df of conditions.

df_avgs

REGION	AVERAGE	STDEV
0	-1.61	7.75
1	2.87	8.38
2	3.61	7.61
3	-10.26	9.19

df_conditions

ID	REGION_NAME
0	Region 0
1	Region 3
2	Region 2
3	Region 1
4	Region 1
5	Region 2
6	Region 3

How do I create a df of length(df_conditions) or just add a column to df_conditions, with samples based on the region?

Solution

IIUC, you can merge the two dataframes together and then, assign the values using list comprehension with a zip of two dataframe columns:

df_zip = df_conditions.assign(REGION=df_conditions['REGION_NAME'].str.extract('([0-9])').astype(int)).merge(df_avgs)

df_conditions['SAMPLES'] = [np.random.normal(loc=l, scale=s, size=1)[0] for l, s in zip(df_zip['AVERAGE'], df_zip['STDEV'])]

print(df_conditions)

Output:

   ID REGION_NAME    SAMPLES
0   0    Region 0  -2.475624
1   1    Region 3  -7.157439
2   2    Region 2  -4.563650
3   3    Region 1  -2.199240
4   4    Region 1   5.221416
5   5    Region 2   7.175620
6   6    Region 3 -22.775366