Apologies in advance for this probably very basic question, but I have been struggling to find out how to do this in R.
When reviewing papers, theses, etc, it is very useful to calculate p-vaules from aggregate data. I.e. you see a table and wonder if the p-value is correctly calculated. In Stata it is very easy to calculate for instance Chi Square test on aggregate data using the immediate commands, for example:
tabi 8 43 \ 2 78, row chi2
gives output
row | 1 2 | Total
1 | 8 43 | 51
| 15.69 84.31 | 100.00
2 | 2 78 | 80
| 2.50 97.50 | 100.00
Total | 10 121 | 131
| 7.63 92.37 | 100.00
Pearson chi2(1) = 7.6805 Pr = 0.006
I struggle to do the same in R, using for instance chisq.test() I have tried, for instance,
chisq.test(c(8, 43, 2, 78))
or
chisq.test(c(8, 43, 2, 78, nrow = 2))
or similar, but it seems to do some completely different calculation...
Chi-squared test for given probabilities
data: c(8, 43, 2, 78, nrow = 2)
X-squared = 167.94, df = 4, p-value < 2.2e-16
Can anyone help with a "quick-fix" for this?
Thanks in advance
Bjorn
I am not entirely sure what you want to achieve, but I think it is possible that you are searching for this(?):
chisq.test(matrix(c(8, 43, 2, 78), nrow = 2))
Anyways, just run ?chisq.test()
to see how the function works, what arguments it expects and in which order, etc.
If you run this, you'll also find a description of how the function works:
"If x is a matrix with one row or column, or if x is a vector and y is not given, then a goodness-of-fit test is performed (x is treated as a one-dimensional contingency table). The entries of x must be non-negative integers. In this case, the hypothesis tested is whether the population probabilities equal those in p, or are all equal if p is not given.
If x is a matrix with at least two rows and columns, it is taken as a two-dimensional contingency table: the entries of x must be non-negative integers. Otherwise, x and y must be vectors or factors of the same length; cases with missing values are removed, the objects are coerced to factors, and the contingency table is computed from these. Then Pearson's chi-squared test is performed of the null hypothesis that the joint distribution of the cell counts in a 2-dimensional contingency table is the product of the row and column marginals."
Check your example data, e.g. when you run
is.matrix(c(8, 43, 2, 78, nrow = 2))
it will return
[1] FALSE
while
is.matrix(matrix(c(8, 43, 2, 78), nrow = 2))
returns
[1] TRUE
So you know that the example you gave was a vector. Now when you read the description of the function I pasted above, you'll find that it will try to perform a "goodness-of-fit test" with your vector. In case of a matrix, it will perform "Pearson's chi-squared test".