I have the following input table (df):
| ColumnA | ColumnB | Blocks |
|---|---|---|
| A | 12 | 1 |
| B | 32 | 1 |
| C | 44 | 1 |
| D | 76 | 2 |
| E | 99 | 2 |
| F | 123 | 2 |
| G | 65 | 2 |
| H | 87 | 3 |
| I | 76 | 3 |
| J | 231 | 3 |
| k | 80 | 4 |
| l | 55 | 4 |
| m | 27 | 5 |
| n | 67 | 5 |
| o | 34 | 5 |
I would like to perform block randomization such that, it pick one value from each blocks ( one value from 1,2,3,4,5) and create that as a separate table.
The output should look something like the following:
| ColumnA | ColumnB | Blocks | Groups |
|---|---|---|---|
| B | 32 | 1 | A1 |
| E | 99 | 2 | A1 |
| I | 76 | 3 | A1 |
| l | 55 | 4 | A1 |
| m | 27 | 5 | A1 |
| A | 12 | 1 | A2 |
| F | 123 | 2 | A2 |
| k | 80 | 3 | A2 |
| m | 27 | 4 | A2 |
| n | 67 | 5 | A2 |
| C | 44 | 1 | A3 |
| H | 87 | 2 | A3 |
| J | 231 | 3 | A3 |
| n | 67 | 4 | A3 |
| o | 34 | 5 | A4 |
| D | 76 | 1 | A4 |
| G | 65 | 2 | A4 |
Randomly selected rows such that each group has all the blocks (evenly distributed).
What I tried so far?
df = df.groupby('blocks').apply(lambda x: x.sample(frac=1,random_state=1234)).reset_index(drop=True)
treatment_groups = [f"A{i}" for i in range(1, n+1)]
df['Groups'] = (df.index // n).map(dict(zip(idx, treatment_groups)))
This doesn't randomize according to the blocks column. How do I do that?
Let us try by defining a function to generate random samples from each block:
def random_samples(n):
for i in range(1, n+1):
for _, g in df.groupby('Blocks'):
yield g.sample(n=1).assign(Groups=f'A{i}')
sampled = pd.concat(random_samples(4), ignore_index=True)
>>> sampled
ColumnA ColumnB Blocks Groups
0 A 12 1 A1
1 D 76 2 A1
2 I 76 3 A1
3 k 80 4 A1
4 n 67 5 A1
5 C 44 1 A2
6 G 65 2 A2
7 J 231 3 A2
8 l 55 4 A2
9 m 27 5 A2
10 B 32 1 A3
11 G 65 2 A3
12 H 87 3 A3
13 l 55 4 A3
14 m 27 5 A3
15 B 32 1 A4
16 F 123 2 A4
17 I 76 3 A4
18 l 55 4 A4
19 m 27 5 A4