I have (lets say) some data about students who live in different halls of residence who pass or fail a fire safety test. I've been asked to do a chi-square test to see if there is an association between hall of residence and pass rate (null hypothesis = hall of residence and pass rate are independent).
The data I've been supplied with is already compressed into contingency table format i.e. aggregated at the level of the hall, not the individual student:
df <- data.frame (
halls = c("AC", "AN", "JH", "MM", "ST"),
pass = c(219, 136, 147, 516, 156),
fail = c(122, 63, 64, 123, 36))
Where AC, AN, JH etc are short versions of halls names.
I import this data from excel using read_xlsx.
If I try to run
chisq <- chisq.test(df)
I get the error message "Error in chisq.test(df) : all entries of 'x' must be nonnegative and finite",
Which I think is because it's a df, not a table.
If I try to convert directly to a table using:
df_table <- as.table(df)
I get the error message: "Error in as.table.default(df) : cannot coerce to a table"
Reading elsewhere suggests I could fix it by converting to a matrix:
df_matrix <- as.matrix(df)
chisq <- chisq.test(df_matrix)
But that gives the same error message as the df:
"Error in chisq.test(df_matrix) : all entries of 'x' must be nonnegative and finite"
There's a lot of previous questions about how to get data into the right format for contingency tables, but everything I can find seems to need some sort of manipulation/aggregation first, rather than a straight conversion. Google gives me lots of ways to go from a table to a data frame, but not the reverse.
Is there a simple way to convert my contingency-table-formatted-as-a-df to something r recognises as a contingency table?
The function chisq.test
requires numerical data, but the "hall" column you have is a character vector. If you try to pass in the whole data frame, chisq.test
finds the character column and doesn't know what to do with it. If you try to convert the data frame to a matrix, R will convert the entire data frame to a character matrix with three columns, which won't do either.
You should only give the two columns containing numerical data to chisq.test
. It doesn't matter if they are in the form of a table, a matrix or a data frame.
The shortest way of doing this is
chisq.test(df[2:3])
#>
#> Pearson's Chi-squared test
#>
#> data: df[2:3]
#> X-squared = 42.884, df = 4, p-value = 1.094e-08
Though you might prefer creating a tabular data frame with row names taken from the hall
variable to show your working:
tab <- df[-1]
rownames(tab) <- df[[1]]
tab
#> pass fail
#> AC 219 122
#> AN 136 63
#> JH 147 64
#> MM 516 123
#> ST 156 36
chisq.test(tab)
#>
#> Pearson's Chi-squared test
#>
#> data: tab
#> X-squared = 42.884, df = 4, p-value = 1.094e-08