this is very basic but I couldn't find an answer online. I use R and have a dataset like this (but much larger):
set.seed(123)
id<-c(1,1,1,2,2,3,3,3,3,3,4,5,5,6,6,6)
week<-c(1,2,3,1,2,1,2,3,4,5,1,1,2,1,2,3)
value<-rnorm(16, mean=5, sd=1)
mydf<-data.frame(id, week, value)
id refers to a particular person, so some individuals have more observations than others. I'd like to take a sample of individuals from the dataframe, but so that for each sampled individuals, all this individual's rows would be included into the sample. If I do
mydf[sample(nrow(mydf),3),]
I obviously just get three random rows, when I'd like to get, for instance
id week value
1 1 4.439524
1 2 4.769823
1 3 6.558708
4 1 6.224082
6 1 5.110683
6 2 4.444159
6 3 6.786913
How to sample rows with this constraint? Thank you in advance!
One option:
# set seed for reproducibility
set.seed(958)
# Sample size
n <- 3
# Take simple random sample from the ids present
sampled_ids <- sample(unique(mydf$id), n)
# Keep only rows of the sampled IDs
mydf[mydf$id %in% sampled_ids, ]
# id week value
# 4 2 1 5.070508
# 5 2 2 5.129288
# 12 5 1 5.359814
# 13 5 2 5.400771
# 14 6 1 5.110683
# 15 6 2 4.444159
# 16 6 3 6.786913