I have data similar to the NHANES data I call below. What I would like to do is loop over a list of several variables to create crosstabs. I would like to stick with the summarytools::ctable package because I want to call the chisq argument. However, happy to use another approach, so long as the chisquare is an option and I can still remove NAs.
Here is what works so far. I am able to use the function below to generate simple frequencies. However, I would like the name of the variable to print before the frequency table. The function first prints all of the variable names THEN goes on to generate the frequencies so that is issue # 1 that I am struggling with:
library(RNHANES)
library(summarytools)
smk <- nhanes_load_data("SMQ_H", "2013-2014")
vars <- c("SMQ040", "SMD093")
ctabs <- function(i) {
print(i)
summarytools::freq(smk[,i])
}
lapply(vars, ctabs)
The next issue is to extend this to ctable. The function works but prints smk[,i] instead of the variable name that is in the list, which is not ideal.
ctabs2 <- function(i) {
summarytools::ctable(smk[,i], smk$SMQ020, chisq=T, useNA = "no")
}
lapply(vars, ctabs2)
Well, actually, when I try it with my own data, I get the error message:
Error: Can't subset columns that don't exist. x Location 2 doesn't exist. ℹ There are only 1 column.
Even though the columns definitely do exist because the simple frequency function works without issue. It appears as though the way the function is written, ctable does not recognize the variables.
To solve the 1st issue use a for
loop.
vars <- c("SMQ040", "SMD093")
ctabs <- function(i) {
print(i)
summarytools::freq(smk[,i])
}
result <- vector('list', length(vars))
for(i in seq_along(vars)) {
result[[i]] <- ctabs(vars[i])
print(result[[i]])
}
#[1] "SMQ040"
#Frequencies
# Freq % Valid % Valid Cum. % Total % Total Cum.
#----------- ------ --------- -------------- --------- --------------
# 1 992 38.46 38.46 13.84 13.84
# 2 240 9.31 47.77 3.35 17.19
# 3 1347 52.23 100.00 18.79 35.98
# <NA> 4589 64.02 100.00
# Total 7168 100.00 100.00 100.00 100.00
#[1] "SMD093"
#Frequencies
# Freq % Valid % Valid Cum. % Total % Total Cum.
#----------- ------ --------- -------------- --------- --------------
# 1 829 67.29 67.29 11.57 11.57
# 2 280 22.73 90.02 3.91 15.47
# 3 69 5.60 95.62 0.96 16.43
# 4 54 4.38 100.00 0.75 17.19
# <NA> 5936 82.81 100.00
# Total 7168 100.00 100.00 100.00 100.00
For the 2nd one use dnn
parameter of ctable
function.
ctabs2 <- function(i) {
summarytools::ctable(smk[[i]], smk$SMQ020, chisq=T, useNA = "no", dnn = c(i, 'SMQ020'))
}
lapply(vars, ctabs2)