rrandomsamplesamplingsurvey

Mystery bug in sampling for loop in R


I am trying to understand what is causing this bug in my R code and I feel like R is gaslighting me.

The sample() function seems to change depending on how I assign it?

Anyways, here is the MRE:

#Sampling Bug MRE
rm(list = ls())
library(tidyverse)
ages=c(paste0("CHILD",seq(1,10),"AGE"))
set.seed(26)
df=c()
for(i in 1:10){
  x=round(runif(1:100,min=1,max=20),0)
  df = as.data.frame(cbind(df,x))
}
names(df)=ages

set.seed(26)
df$`Sampled Child`=0
test_vector=c()
for(i in 1:nrow(df)){
  childs_age = unlist(c(as.numeric(df[i,ages])))
  slice=which(childs_age<=17)
  if(length(slice)>=1){
    df$`Sampled Child`[i]=sample(x=slice,size=1,replace = F)
    test_vector=append(test_vector,sample(x=slice,size=1,replace = F))
  }
  else{
    df$`Sampled Child`[i]="Ineligibile"
    test_vector=append(test_vector,"Ineligibile")
  }
}
df$test=test_vector
sum(df$`Sampled Child`==df$test)

I just need someone to explain why assigning the value with df$Sampled Child[i] is assigning a different number than just appending it to a vector?

TIA!

I am trying to sample a child who is less than 17 years old only. Once I know which children are less than 17, I pick one at randomly. If there are no children less than 17, they are ineligible.


Solution

  • You're getting different answers because you're calling sample() twice.

    If your code instead looked like this:

     if(length(slice)>=1){
        cur_samp <- sample(x=slice,size=1,replace = FALSE)
        df$`Sampled Child`[i] <- cur_samp
        test_vector=append(test_vector,cur_samp)
      }
    

    then the two results should be equal.

    For what it's worth, growing data frames and vectors by repeatedly appending to them (or inserting into positions beyond the end of the vector) is inefficient in R; it's the second circle of the R Inferno. It would be better to create a vector of the appropriate length (e.g. filled with NA values) first, then assign to appropriate elements as you go.