First, I want to create a column that randomize 1s and 0s by group while maintaining the same proportion of 1s and 0s in another column.
Second, I want to repeat the above procedure many times (say 1000) and calculate the expected value.
Let me clarify with hypothetical data.
library(data.table)
district <- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3)
village <- c(1,2,3,4,1,2,3,4,5,1,2,3,4,5,6,7)
status <- c(1,0,1,0, 1,1,1,0,0,1,1,1,1,0,0,0)
datei <- data.table(district, village, status)
What I want to do is I want to create a column that randomize 1s and 0s within a district while maintaining the same proportion of 1s and 0s in status; the proportions of 1:0 are 2:2, 3:2 and 4:3 in district 1, 2 and 3 respectively.
Second, I also want to repeat this randomization many times (say 1000 times) and calculate the expected value for each row.
I know how to randomize 1s and 0s based on district.
datei[, random_status := sample(c(1,0), .N, replace=TRUE), keyby = district]
However, I do not know how to have the same proportion of 1s and 0s as in status and how to repeat and calculate the expected values for each row.
Many thanks.
Edit: Let me add what I expect regarding calculating the expected values for each raw after, say, 1000 repetitions. Column exp_status is generated after randomizing many times while keeping the proportion of 1:0 within district is the same as in status.
district | village | status | exp_status |
---|---|---|---|
1 | 1 | 1 | 0.9 |
1 | 2 | 0 | 0.7 |
1 | 3 | 1 | 0.8 |
1 | 4 | 0 | 0.1 |
2 | 1 | 1 | 0.2 |
2 | 2 | 1 | 0.3 |
2 | 3 | 1 | 0.2 |
2 | 4 | 0 | 0.9 |
2 | 5 | 0 | 0.8 |
3 | 1 | 1 | 0.4 |
3 | 2 | 1 | 0.5 |
3 | 3 | 1 | 0.9 |
3 | 4 | 1 | 0.8 |
3 | 5 | 0 | 0.9 |
3 | 6 | 0 | 0.8 |
3 | 7 | 0 | 0.7 |
Use a table
as prob=
, which gives on large scale similar proportions.
set.seed(42)
datei[, random_status := sample(0:1, .N, replace=TRUE, prob=table(status)), keyby = district]
colMeans(datei[, 3:4])
# status random_status
# 0.56339 0.56277
Data:
(slightly blown up, to 1e5 rows)
datei <- structure(list(district = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3,
3, 3, 3, 3, 3), village = c(1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2,
3, 4, 5, 6, 7), status = c(1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1,
1, 0, 0, 0)), row.names = c(NA, -16L), class = c("data.table",
"data.frame"))
set.seed(42)
datei <- datei[sample.int(nrow(datei), 1e5, replace=TRUE), ]