rlapplycrosstabsummarytools

Generating multiple crosstabs with lapply and summarytools::ctable in R


I have data similar to the NHANES data I call below. What I would like to do is loop over a list of several variables to create crosstabs. I would like to stick with the summarytools::ctable package because I want to call the chisq argument. However, happy to use another approach, so long as the chisquare is an option and I can still remove NAs.

Here is what works so far. I am able to use the function below to generate simple frequencies. However, I would like the name of the variable to print before the frequency table. The function first prints all of the variable names THEN goes on to generate the frequencies so that is issue # 1 that I am struggling with:

library(RNHANES)
library(summarytools)

smk <- nhanes_load_data("SMQ_H", "2013-2014")

vars <- c("SMQ040", "SMD093")

ctabs <- function(i) {
  print(i)
  summarytools::freq(smk[,i]) 
}

lapply(vars, ctabs)

The next issue is to extend this to ctable. The function works but prints smk[,i] instead of the variable name that is in the list, which is not ideal.

ctabs2 <- function(i) {
  summarytools::ctable(smk[,i], smk$SMQ020, chisq=T, useNA = "no") 
}

lapply(vars, ctabs2)

Well, actually, when I try it with my own data, I get the error message:

Error: Can't subset columns that don't exist. x Location 2 doesn't exist. ℹ There are only 1 column.

Even though the columns definitely do exist because the simple frequency function works without issue. It appears as though the way the function is written, ctable does not recognize the variables.


Solution

  • To solve the 1st issue use a for loop.

    vars <- c("SMQ040", "SMD093")
    
    ctabs <- function(i) {
      print(i)
      summarytools::freq(smk[,i]) 
    }
    
    result <- vector('list', length(vars))
    for(i in seq_along(vars)) {
      result[[i]] <- ctabs(vars[i])
      print(result[[i]])
    }
    
    #[1] "SMQ040"
    #Frequencies  
    
    #              Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
    #----------- ------ --------- -------------- --------- --------------
    #          1    992     38.46          38.46     13.84          13.84
    #          2    240      9.31          47.77      3.35          17.19
    #          3   1347     52.23         100.00     18.79          35.98
    #       <NA>   4589                              64.02         100.00
    #      Total   7168    100.00         100.00    100.00         100.00
    #[1] "SMD093"
    #Frequencies  
    
    #              Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
    #----------- ------ --------- -------------- --------- --------------
    #          1    829     67.29          67.29     11.57          11.57
    #          2    280     22.73          90.02      3.91          15.47
    #          3     69      5.60          95.62      0.96          16.43
    #          4     54      4.38         100.00      0.75          17.19
    #       <NA>   5936                              82.81         100.00
    #      Total   7168    100.00         100.00    100.00         100.00
    

    For the 2nd one use dnn parameter of ctable function.

    ctabs2 <- function(i) {
      summarytools::ctable(smk[[i]], smk$SMQ020, chisq=T, useNA = "no", dnn = c(i, 'SMQ020')) 
    }
    
    lapply(vars, ctabs2)