rmatlabchi-squaredcontingency

Reproduce the "chisq.test(cbind(x, y))" R code in Matlab


Introduction --> I am trying to reproduce the same results in R and in Matlab, of the chi-squared test on contingency tables, for two observed frequency datasets, x and y (therefore, x and y are my inputs). In particular, I am focusing on the special case where x=y.

In R, I get the following:

> x = c(1000,100,10,1)
> y = c(1000,100,10,1)
> chisq.test(x,y)

    Pearson's Chi-squared test

data:  x and y
X-squared = 12, df = 9, p-value = 0.2133

In Matlab, I get the following:

>> x = [1000 100 10 1];
>> y = [1000 100 10 1];
>> [~,chi2,p] = crosstab(x,y)
chi2 =
    12
p =
      0.21331

So far so god. However, intuitively, if I perform a chi-squared test for the same observed frequency datasets (i.e. for the x=y case), I would expect to get a test statistic equal to 0 and a p-value equal to 1. This can be achieved in R as follows:

> x = c(1000,100,10,1)
> y = c(1000,100,10,1)
> chisq.test(cbind(x, y))

    Pearson's Chi-squared test

data:  cbind(x, y)
X-squared = 0, df = 3, p-value = 1

Question --> How can I get the same in Matlab, i.e. a test statistic equal to 0 and a p-value equal to 1 for the x=y case?

Additional note --> If it can be useful, I have tried to play a bit with the Matlab code, and I was able to reproduce the [~,chi2,p] = crosstab(x,y) (but obviously not the equivalent of the R code for chisq.test(cbind(x, y))!):

% Inputs
x = [1000 100 10 1];
y = [1000 100 10 1];
% Chi-squared test on contingency tables
[unique_x,ix,jx] = unique(x);
[unique_y,iy,jy] = unique(y);
OT = zeros(numel(unique_x),numel(unique_y));
for i = 1:numel(x)
     OT(jx(i),jy(i)) = OT(jx(i),jy(i)) + 1;
end
R = sum(OT,2);
C = sum(OT);
n = sum(sum(OT));
ET = (R*C)./n;
chi2 = (OT - ET).^ 2 ./ ET;
chi2 = sum(chi2(:))                  % <-- test statistic
df = (numel(R)-1)*(numel(C)-1)       % <-- degrees of freedom
p = gammainc(chi2/2,df/2,'upper')    % <-- p-value
OT                                   % <-- OT = Observed frequency contingency table 
ET                                   % <-- ET = Expected frequency contingency table

Which returns the following:

chi2 =
    12
df =
     9
p =
      0.21331
OT =
     1     0     0     0
     0     1     0     0
     0     0     1     0
     0     0     0     1
ET =
         0.25         0.25         0.25         0.25
         0.25         0.25         0.25         0.25
         0.25         0.25         0.25         0.25
         0.25         0.25         0.25         0.25

Solution

  • To replicate the results of chisq.test(cbind(x, y)) where the test statistic is 0 and the p-value is 1 for identical datasets. You can create a custom implementation that allows you to construct the contingency table directly, as you did in R.

    Once you define your inputs:

    % Inputs
    x = [1000 100 10 1];
    y = [1000 100 10 1];
    

    You can do it in MATLAB via the script:

    % Combine x and y into a 2-column matrix
    observed = [x(:) y(:)];
    
    % Calculate the row and column totals
    row_totals = sum(observed, 2);
    col_totals = sum(observed, 1);
    
    % Calculate the grand total
    grand_total = sum(row_totals);
    
    % Compute the expected frequency table
    expected = (row_totals * col_totals) / grand_total;
    
    % Calculate the chi-squared statistic
    chi2 = sum((observed(:) - expected(:)).^2 ./ expected(:));
    
    % Degrees of freedom
    df = (size(observed, 1) - 1) * (size(observed, 2) - 1);
    
    % Compute the p-value
    p = 1 - chi2cdf(chi2, df);
    
    % Display the results
    disp(['Chi-squared statistic: ', num2str(chi2)])
    disp(['Degrees of freedom: ', num2str(df)])
    disp(['p-value: ', num2str(p)])
    

    In this case the observed matrix is a 2-column matrix where each row contains two variables x and y. The expected frequency table is calculated based on the row and column totals.

    The chi-squared statistic is calculated using the recorded and expected frequencies, the p-value is computed using the chi-squared cumulative distribution function.