rmatrixstatisticschi-squaredr-rownames

chisquare test in r that keeps row names


I'm building an employee survey with two waves, and I want to make sure that each wave is balanced in terms of some demographic variables, such as ethnicity and gender. Here is a fictitious sample of the data:

library(tidyverse)
sample_data <- tibble(demographics = c("White / Female", "Non-White / Female", "White / Male", "Non-White / Male", "White / Transgender", "Non-White / Transgender"),
                      wave_1 = c(40, 38, 60, 56, 0, 2),
                      wave_2 = c(38, 39, 62, 58, 1, 0))

If I run the chisq.test() on sample_data, I get an error:

library(stats)
chisq.test(sample_data)

Error in chisq.test(sample_data) : 
  all entries of 'x' must be nonnegative and finite

But I don't get the error if I just use the two count columns:

sample_data_count <- sample_data %>%
  dplyr::select(wave_1, wave_2)
chisq.test(sample_data_count)

    Pearson's Chi-squared test

data:  sample_data_count
X-squared = 3.1221, df = 5, p-value = 0.6812

Warning message:
In chisq.test(sample_data_count) :
  Chi-squared approximation may be incorrect

I understand that R doesn't like that I have my demographics in the sample_data, but it's hard not having them in if I want to look at the observed values by various demographics. Is there a way to run the chisquare test with those row names in?

I saw an example using at http://www.sthda.com/english/wiki/chi-square-test-of-independence-in-r using this dataset (file_path <- "http://www.sthda.com/sthda/RDoc/data/housetasks.txt") that does do a chi square test in r with the row names still in it.

Any help would be appreciated!


Solution

  • Because it also iincludes character column. According to ?chisq.test

    x - a numeric vector or matrix. x and y can also both be factors.

    y - a numeric vector; ignored if x is a matrix. If x is a factor, y should be a factor of the same length.

    If we want to pass a numeric matrix, either select the numeric columns or convert the 'demographics' to row names, convert to matrix and apply the test

    library(dplyr)
    library(tibble)
    sample_data %>% 
       column_to_rownames('demographics') %>%
       as.matrix %>% 
       chisq.test