rsample-size

samplesize package in R, understanding the parameters


Small Disclaimer: I considered posting this on cross-validated, but I feel that this is more related to a software implementation. The question can be migrated if you disagree.

I am trying out the package samplesize. I am trying to decipher what the k parameter for the function n.ttest is. The following is stated in the documentation:

k Sample fraction k

This is not very helpful. What exactly is this parameter?

I am performing the following calculations, all the essential values are in the vals variable, which I provide below:

power <- 0.90
alpha <- 0.05
vals <- ??? # These values are provided below
mean.diff <- vals[1,2]-vals[2,2]
sd1 <- vals[1,3]
sd2 <- vals[2,3]
k <- vals[2,4]/(vals[1,4]+vals[2,4])
design <- "unpaired"
fraction <- "unbalanced"
variance <- "equal"

# Get the sample size
n.ttest(power = power, alpha = alpha, mean.diff = mean.diff, 
        sd1 = sd1, sd2 = sd2, k = k, design = design, 
        fraction = fraction, variance = variance)

vals contains the following values:

> vals
  affected       mean       sd length
1        1 -0.8007305 7.887657     57
2        2  4.5799913 6.740781     16

Is k the proportion of one group, in the total number of observations? Or is it something else? If I am correct, then does the proportion correspond to group with sd1 or sd2?


Solution

  • Your first instinct was right -- this belongs on stats.SE and not on SO. The parameter k has a statistical interpretation which can be found in any reference on power analysis. It basically sets the sample size of the second sample, when, as in the case of two-sample tests, the second sample is constrained to be a certain fraction of the first.

    You can see the relevant lines of the code here (lines 106 to 120 of n.ttest):

    unbalanced = {
                      df <- n.start - 2
                      c <- (mean.diff/sd1) * (sqrt(k)/(1 + k))
                      tkrit.alpha <- qt(conf.level, df = df)
                      tkrit.beta <- qt(power, df = df)
                      n.temp <- ((tkrit.alpha + tkrit.beta)^2)/(c^2)
                      while (n.start <= n.temp) {
                        n.start <- n.start + 1
                        tkrit.alpha <- qt(conf.level, df = n.start - 
                          2)
                        tkrit.beta <- qt(power, df = n.start - 2)
                        n.temp <- ((tkrit.alpha + tkrit.beta)^2)/(c^2)
                      }
                      n1 <- n.start/(1 + k)
                      n2 <- k * n1
    

    In your case:

    library(samplesize)
    
    vals = data.frame(
      affected = c(1, 2), 
      mean = c(-0.8007305, 4.5799913), 
      sd = c(7.887657, 6.740781), 
      length = c(57, 16))
    
    power <- 0.90
    alpha <- 0.05
    mean.diff <- vals[1,2]-vals[2,2]
    sd1 <- vals[1,3]
    sd2 <- vals[2,3]
    k <- vals[2,4]/(vals[1,4]+vals[2,4])
    k <- vals[2,4]/vals[1,4]
    
    design <- "unpaired"
    fraction <- "unbalanced"
    variance <- "equal"
    
    # Get the sample size
    tt1 = n.ttest(power = power, 
            alpha = alpha, 
            mean.diff = mean.diff, 
            sd1 = sd1, 
            sd2 = sd2, 
            k = k, 
            design = design, 
            fraction = fraction, 
            variance = variance)
    

    You can see that:

    assertthat::are_equal(ceiling(tt1$`Sample size group 1`*tt1$Fraction), 
                          tt1$`Sample size group 2`)