rstatisticsanovafactorialexperimental-design

ANOVA for a three factor problem in R not giving the same results as textbook and hand calculations


I have completed example 5.3 from Design and Analysis of Experiments (Montgomery 2013, ISBN:9781118097939) in Excel and got the same results. However, when completing the ANOVA in R, I get different results. Although being familiar with python, I am still new to R, so I am not sure what I am doing wrong. I suspect it has something to do with the degrees of freedom calculated in R, since they differ significantly from the degrees of freedom calculated in Excel. Find attached the results as an image. Textbook results confirmed in Excel by myself. The question statement is attached here as well. Question statement.

My code in R was as follows

bottel_vul_data <- data.frame(A = rep(c(10, 12, 14), each = 2),
                              B = rep(c(25, 30), each = 12),
                              C = rep(c(200, 250), each = 6),        
                              vul = c(-3, -1, 0, 1, 5, 4,
                                      -1,  0, 2, 1, 7, 6,   
                                      -1,  0, 2, 3, 7, 9,
                                       1,  1, 6, 5, 10, 11))

bottel_anova <- aov(vul ~ A * B * C, data = bottel_vul_data)
summary(bottel_anova)

which yielded

            Df Sum Sq Mean Sq F value   Pr(>F)    
A            1 248.06  248.06 275.306 1.67e-11 ***
B            1  45.37   45.37  50.358 2.53e-06 ***
C            1  22.04   22.04  24.462 0.000146 ***
A:B          1   5.06    5.06   5.618 0.030675 *  
A:C          1   0.56    0.56   0.624 0.441016    
B:C          1   1.04    1.04   1.156 0.298229    
A:B:C        1   0.06    0.06   0.069 0.795626    
Residuals   16  14.42    0.90                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Although not part of the main question, I also have no clue as to what the Signif. codes mean. If anyone knows, that would be great for background knowledge.


Solution

  • The issue here is that R is interpreting variable A as if it is a numeric predictor, whereas in you text book example they have treated it as a factor (i.e., a categorical variable). Once you convert A to a factor in r you get the correct result (the same as the photo from your text book). There's many ways to do this but perhaps the easiest is:

    bottel_anova <- aov(vul ~ factor(A) * B * C, data = bottel_vul_data)
    summary(bottel_anova)
    
                  Df Sum Sq Mean Sq F value   Pr(>F)    
    factor(A)      2 252.75  126.38 178.412 1.19e-09 ***
    B              1  45.37   45.37  64.059 3.74e-06 ***
    C              1  22.04   22.04  31.118  0.00012 ***
    factor(A):B    2   5.25    2.63   3.706  0.05581 .  
    factor(A):C    2   0.58    0.29   0.412  0.67149    
    B:C            1   1.04    1.04   1.471  0.24859    
    factor(A):B:C  2   1.08    0.54   0.765  0.48687    
    Residuals     12   8.50    0.71                     
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    

    As to your signif. code question it is just a quick way of showing your p value in the final column (*** p < .001, ** p < .01, * p < .05 etcetera).