My dataframe looks like this:
City | Mean | SD |
---|---|---|
Newcastle | 60 | 0.81 |
Liverpool | 62 | 0.91 |
Cardiff | 65 | 0.87 |
Glasgow | 59 | 0.86 |
I want to add column 'n' which contains new random values using the Mean and SD column values. I've done this before using:
df['n'] = np.random.normal(df['Mean'], df['SD'])
I then want to add a second column which generates a quintile rank based on the value in 'n'. I've done this using:
df['q'] = pd.qcut(df['n'], 5, labels = False)
City | Mean | SD | n | q | n+1 | q+1 |
---|---|---|---|---|---|---|
Newcastle | 60 | 0.81 | 57 | 5 | 55 | 5 |
Liverpool | 62 | 0.91 | 61 | 1 | 57 | 4 |
Cardiff | 65 | 0.87 | 60 | 1 | 61 | 1 |
Glasgow | 59 | 0.86 | 55 | 3 | 58 | 3 |
I would like to loop these two steps to add 2000 columns, 1000 'n' columns (named 'n+1') and a 1000 'q' columns (named 'q+1').
This was resolved using:
mean = df['Mean']
std_dev = df['SD']
dist = np.random.normal(mean, std_dev)
for i in range(1000):
col_name = 'col_' + str(i)
df[col_name] = np.random.normal(dist)'
col_name_q = 'col_q_' + str(i)
df[col_name_q] = pd.qcut(df[col_name], 5, labels = False)