I'm trying to write a for loop to create tables using sjt.xtab()
so it iterates through every variation in a dataframe. Ideally this would be generalizable to all other dataframes too so a function would probably be better.
I have a dataframe called df_mod:
'data.frame': 849 obs. of 17 variables:
$ amazon : Factor w/ 4 levels "0","1","2","3": 2 2 2 1 3 2 3 3 2 2 ...
$ manhattan : Factor w/ 2 levels "Manhattan","Other": 2 2 2 1 2 2 2 2 2 2 ...
$ income : Factor w/ 5 levels "$25-49k","$50 - 74k",..: 2 4 4 1 3 2 3 5 5 2 ...
$ phone : Factor w/ 2 levels "0","1": 2 2 2 2 2 1 2 2 2 2 ...
$ gender : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 2 2 1 1 ...
$ age : Factor w/ 6 levels "18-24","25-34",..: 2 2 3 6 5 6 3 2 3 3 ...
$ education : Factor w/ 3 levels "College","Graduate",..: 2 3 3 1 3 1 1 2 2 2 ..
and tried:
xtab_list <- list()
# Iterate through the columns in the dataframe
for (i in 1:ncol(df_mod)) {
for (j in i+1:ncol(df_mod)) {
# Calculate the contingency table for each pair of columns
xtab <- sjt.xtab(df_mod[,names(df_mod)[i]], df_mod[,names(df_mod)[j]])
# Append the contingency table to the list
xtab_list[[paste0(names(df_mod)[i], names(df_mod)[j])]] <- xtab
}
}
and receive this error:
Error in `[.data.frame`(df_mod, , names(df_mod)[j]) :
undefined columns selected
Also tried it without the names(df_mod)
but received the same error.
It works when when I write out individual columns so it's not the column type (all factors):
sjt.xtab(df_mod$gender, df_mod$education)
so I'm not sure what I'm doing wrong, especially since it's bad coding to do each one by one and I'd much rather do it once properly. Thank you!
The code in the original post fails to produce expected results due to a subtle error in the line:
for (j in i+1:ncol(df_mod)){ ... }
It executes as "j in i plus (1:ncol(df_mod))." That is, R evaluates the :
operator before the binary +
operator. This is documented in R: Operator Syntax and Precedence in the R documentation.
What was originally intended would be written as:
for (j in (i+1):ncol(df_mod)){ ... }
For example, when i
is 1, the original for()
loop for j
iterates from 2 to ncol(df_mod)
+ 1, which points to a nonexistent column.
We can loop through the mtcars
data frame to generate pairs of columns that are needed for a set of cross tabs. For now we'll ignore the underlying data types to illustrate how to generate the combinations of columns via a nest of 2 for()
loops.
Since a crosstab of a variable with itself is not particularly helpful, we'll end the for(i in ...)
loop at ncol(mtcars) - 1
.
for(i in 1:(ncol(mtcars) - 1)){
for(j in (i+1):ncol(mtcars)){
message(paste("i is:",names(mtcars)[i],"j is:",names(mtcars)[j]))
}
}
We'll print the last 6 rows of messages to show how the sequence ends.
i is: vs j is: am
i is: vs j is: gear
i is: vs j is: carb
i is: am j is: gear
i is: am j is: carb
i is: gear j is: carb
Another way to solve this problem is to generate all the combinations of the desired variables, and process them in an apply()
function.
We'll create a reproducible example with the mtcars
data frame.
# categorical variables in mtcars are vs, am, cyl, gear, carb
theColumns <- c("vs","am","cyl","gear","carb")
library(sjPlot)
# generate combinations of the categorical variables for xtabs
theCombinations <- combn(theColumns,2)
At this point we have a 2 row 10 column matrix that represents the unique combinations of the 5 categorical variables in the mtcars
data frame.
theCombinations
> theCombinations
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "vs" "vs" "vs" "vs" "am" "am" "am" "cyl" "cyl" "gear"
[2,] "am" "cyl" "gear" "carb" "cyl" "gear" "carb" "gear" "carb" "carb"
>
Next, we'll use lapply()
to loop through the matrix and generate the cross tabs, saving them to a list called theTabs
.
# for each column in theCombinations, run the xtab
theTabs <- lapply(1:ncol(theCombinations),function(x,y,z){
sjt.xtab(z[[y[1,x]]],z[[y[2,x]]])
},theCombinations,mtcars)
The anonymous function within lapply()
takes three arguments. The first, x
, is the sequence from 1 to ncol()
of the matrix containing the pairs of variables for which we will generate 2-way contingency tables. The lapply()
function will call the anonymous function ncol()
times.
The second argument, y
, represents the matrix of column combinations.
The third argument, z
represents the data frame where the columns to be tabulated are stored.
Finally, we print the first item in the list.
# print the first table
theTabs[[1]]
...and the output:
Hmm... those variable labels look a bit funky, so we'll add a var.labels
argument to make the output easier to read.
# add some variable labels
theTabs <- lapply(1:ncol(theCombinations),function(x,y,z){
sjt.xtab(z[[y[1,x]]],z[[y[2,x]]], var.labels = c(y[1,x],y[2,x]))
},theCombinations,mtcars)
# print the first table
theTabs[[1]]
...and the output:
Extracting the names of the factor columns in a data frame to use as theColumns
is left as an exercise for the reader.