I am trying to write a function that resamples names nested in groups. My function works for resampling without respect to groups, but I don't want to create samples of names that aren't in the same group.
Here's the function, where x is a vector of all names (some repeated), a is a vector of unique name observations, and b is a vector of unique names in randomized order.
rep <- function(x,a,b){
for(i in 1:length(a)){
x1 <- x
x1[which(x==a[i])] <- b[i]
}
x1
}
x <- c("Smith", "Jones", "Washington", "Miller", "Wells", "Smith", "Smith", "Miller")
a <- sort(unique(x))
b <- sample(a, length(a))
dat <- rep(x, a, b)
View(dat)
"Smith" "Jones" "Washington" "Miller" "Jones" "Smith" "Smith" "Miller"
However, each name is nested in a group, so I need to avoid creating samples of names that are not in the same group. For example:
x groupid
Smith A1
Jones B1
Washington C1
Miller A2
Wells B1
Smith A2
Smith A3
Miller A3
How can I account for that?
This would be easier to accomplish with the tidyverse packages:
library(tidyverse)
txt <- 'x groupid
Smith A1
Jones B1
Washington C1
Miller A2
Wells B1
Smith A2
Smith A3
Miller A3'
df <- read_table(file = txt)
set.seed(0)
df.new <- df %>%
group_by(groupid) %>%
mutate(
b = sample(unique(x), n(), replace = T)
) %>%
arrange(groupid)
x groupid b
<chr> <chr> <chr>
1 Smith A1 Smith
2 Miller A2 Miller
3 Smith A2 Smith
4 Smith A3 Smith
5 Miller A3 Miller
6 Jones B1 Wells
7 Wells B1 Jones
8 Washington C1 Washington