Can an ANOVA be carried out using a dataframe looking like this?
category_1 | category_2 | category_4 | category_5 |
---|---|---|---|
0.75 | 0.82 | 0.91 | 0.32 |
0.71 | 0.39 | 0.21 | 0.76 |
0.17 | 0.10 | 0.43 | 0.37 |
I already tried using unlist
to transform the data into a long format. However, the column names will be in a column without a name in that case and have an extra number tied to them. Then, it should not be possible to use an ANOVA. Is there another way?
"category_x" is the grouping variable, and I want to check whether some categories are used more often than others (higher category score = used more often).
Let us recreate your data frame and call it df
:
df <- read.table(text = '
category_1 category_2 category_4 category_5
1 0.75 0.82 0.91 0.32
2 0.71 0.39 0.21 0.76
3 0.17 0.10 0.43 0.37')
To get these data in a suitable format for ANOVA, we can pivot to long format. This puts all the values in one column, and creates another column that labels each value according to its original column. We can use pivot_longer
from the tidyverse to do this
library(tidyverse)
df <- pivot_longer(df, everything(), names_to = 'Category', values_to = 'Value')
Now our data frame looks like this:
df
#> # A tibble: 12 x 2
#> Category Value
#> <chr> <dbl>
#> 1 category_1 0.75
#> 2 category_2 0.82
#> 3 category_4 0.91
#> 4 category_5 0.32
#> 5 category_1 0.71
#> 6 category_2 0.39
#> 7 category_4 0.21
#> 8 category_5 0.76
#> 9 category_1 0.17
#> 10 category_2 0.1
#> 11 category_4 0.43
#> 12 category_5 0.37
We can now create a linear model of the values according to category and review the summary:
model <- lm(Value ~ Category, data = df)
summary(model)
#>
#> Call:
#> lm(formula = Value ~ Category, data = df)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.37333 -0.19917 -0.06667 0.22417 0.39333
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.54333 0.18760 2.896 0.020 *
#> Categorycategory_2 -0.10667 0.26531 -0.402 0.698
#> Categorycategory_4 -0.02667 0.26531 -0.101 0.922
#> Categorycategory_5 -0.06000 0.26531 -0.226 0.827
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.3249 on 8 degrees of freedom
#> Multiple R-squared: 0.02204, Adjusted R-squared: -0.3447
#> F-statistic: 0.06009 on 3 and 8 DF, p-value: 0.9794
Finally, we can run our model through anova
anova(model)
#> Analysis of Variance Table
#>
#> Response: Value
#> Df Sum Sq Mean Sq F value Pr(>F)
#> Category 3 0.01903 0.006344 0.0601 0.9794
#> Residuals 8 0.84467 0.105583
Created on 2022-06-12 by the reprex package (v2.0.1)