rfor-loopsjplotcontingency

How to write for loop for sjt.xtab in R, a df of factors?


I'm trying to write a for loop to create tables using sjt.xtab() so it iterates through every variation in a dataframe. Ideally this would be generalizable to all other dataframes too so a function would probably be better.

I have a dataframe called df_mod:

'data.frame':   849 obs. of  17 variables:
 $ amazon       : Factor w/ 4 levels "0","1","2","3": 2 2 2 1 3 2 3 3 2 2 ...
 $ manhattan    : Factor w/ 2 levels "Manhattan","Other": 2 2 2 1 2 2 2 2 2 2 ...
 $ income       : Factor w/ 5 levels "$25-49k","$50 - 74k",..: 2 4 4 1 3 2 3 5 5 2 ...
 $ phone        : Factor w/ 2 levels "0","1": 2 2 2 2 2 1 2 2 2 2 ...
 $ gender       : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 2 2 1 1 ...
 $ age          : Factor w/ 6 levels "18-24","25-34",..: 2 2 3 6 5 6 3 2 3 3 ...
 $ education    : Factor w/ 3 levels "College","Graduate",..: 2 3 3 1 3 1 1 2 2 2 ..

and tried:

xtab_list <- list()

# Iterate through the columns in the dataframe
for (i in 1:ncol(df_mod)) {
  for (j in i+1:ncol(df_mod)) {
    # Calculate the contingency table for each pair of columns
    xtab <- sjt.xtab(df_mod[,names(df_mod)[i]], df_mod[,names(df_mod)[j]])
    # Append the contingency table to the list
    xtab_list[[paste0(names(df_mod)[i], names(df_mod)[j])]] <- xtab
  }
}

and receive this error:

Error in `[.data.frame`(df_mod, , names(df_mod)[j]) :
undefined columns selected

Also tried it without the names(df_mod) but received the same error.

It works when when I write out individual columns so it's not the column type (all factors):

sjt.xtab(df_mod$gender, df_mod$education)

so I'm not sure what I'm doing wrong, especially since it's bad coding to do each one by one and I'd much rather do it once properly. Thank you!


Solution

  • The code in the original post fails to produce expected results due to a subtle error in the line:

    for (j in i+1:ncol(df_mod)){ ... }
    

    It executes as "j in i plus (1:ncol(df_mod))." That is, R evaluates the : operator before the binary + operator. This is documented in R: Operator Syntax and Precedence in the R documentation.

    What was originally intended would be written as:

    for (j in (i+1):ncol(df_mod)){ ... }
    

    For example, when i is 1, the original for() loop for j iterates from 2 to ncol(df_mod) + 1, which points to a nonexistent column.

    Example: pairs of columns in mtcars

    We can loop through the mtcars data frame to generate pairs of columns that are needed for a set of cross tabs. For now we'll ignore the underlying data types to illustrate how to generate the combinations of columns via a nest of 2 for() loops.

    Since a crosstab of a variable with itself is not particularly helpful, we'll end the for(i in ...) loop at ncol(mtcars) - 1.

    for(i in 1:(ncol(mtcars) - 1)){
         for(j in (i+1):ncol(mtcars)){
              message(paste("i is:",names(mtcars)[i],"j is:",names(mtcars)[j]))
         }
    }
    

    We'll print the last 6 rows of messages to show how the sequence ends.

    i is: vs j is: am
    i is: vs j is: gear
    i is: vs j is: carb
    i is: am j is: gear
    i is: am j is: carb
    i is: gear j is: carb
    

    An Alternate Approach

    Another way to solve this problem is to generate all the combinations of the desired variables, and process them in an apply() function.

    We'll create a reproducible example with the mtcars data frame.

    # categorical variables in mtcars are vs, am, cyl, gear, carb
    theColumns <- c("vs","am","cyl","gear","carb")
    
    library(sjPlot)
    
    # generate combinations of the categorical variables for xtabs 
    theCombinations <- combn(theColumns,2)
    

    At this point we have a 2 row 10 column matrix that represents the unique combinations of the 5 categorical variables in the mtcars data frame.

    theCombinations
    
    > theCombinations
         [,1] [,2]  [,3]   [,4]   [,5]  [,6]   [,7]   [,8]   [,9]   [,10] 
    [1,] "vs" "vs"  "vs"   "vs"   "am"  "am"   "am"   "cyl"  "cyl"  "gear"
    [2,] "am" "cyl" "gear" "carb" "cyl" "gear" "carb" "gear" "carb" "carb"
    >
    

    Next, we'll use lapply() to loop through the matrix and generate the cross tabs, saving them to a list called theTabs.

    # for each column in theCombinations, run the xtab 
    theTabs <- lapply(1:ncol(theCombinations),function(x,y,z){
         sjt.xtab(z[[y[1,x]]],z[[y[2,x]]])
    },theCombinations,mtcars)
    

    The anonymous function within lapply() takes three arguments. The first, x, is the sequence from 1 to ncol() of the matrix containing the pairs of variables for which we will generate 2-way contingency tables. The lapply() function will call the anonymous function ncol() times.

    The second argument, y, represents the matrix of column combinations.

    The third argument, z represents the data frame where the columns to be tabulated are stored.

    Finally, we print the first item in the list.

    # print the first table 
    theTabs[[1]]
    

    ...and the output:

    enter image description here

    Hmm... those variable labels look a bit funky, so we'll add a var.labels argument to make the output easier to read.

    # add some variable labels 
    theTabs <- lapply(1:ncol(theCombinations),function(x,y,z){
         sjt.xtab(z[[y[1,x]]],z[[y[2,x]]], var.labels = c(y[1,x],y[2,x]))
    },theCombinations,mtcars)
    
    # print the first table 
    theTabs[[1]]
    

    ...and the output:

    enter image description here

    Extracting the names of the factor columns in a data frame to use as theColumns is left as an exercise for the reader.