Can anyone give me a hint on how to run the Kruskal-Wallis Test below?
My objective : Is there any significance of the growth (agg_rel_abund) of bacteria between Forest and Urban for each family.
The code I have tried in R : kruskal.test(Habitat ~ agg_rel_abund, data = my_data)
but obviously I know that is wrong... because I didn't hit my objective..
Let me briefly explain about my data :
There are types of sample, which is F and W.
When the sample name start with F, it means the Habitat is from Urban.
When the sample name start with W, it means the Habitat is from Forest.
It is okay if want to perform Mann-Whitey Test, or any Non-Parametric Test too... as long as can get to know the significance of the growth (agg_rel_abund) of bacteria between Forest and Urban for each family.
Sample | Habitat | Family | agg_rel_abund |
---|---|---|---|
F10 | Urban | Acetobacteraceae | 0 |
F2 | Urban | Acetobacteraceae | 0 |
F3 | Urban | Acetobacteraceae | 0 |
F7 | Urban | Acetobacteraceae | 0.000132118 |
F8 | Urban | Acetobacteraceae | 0 |
W10 | Forest | Acetobacteraceae | 0 |
W13 | Forest | Acetobacteraceae | 0 |
W3 | Forest | Acetobacteraceae | 0 |
W6 | Forest | Acetobacteraceae | 0 |
W9 | Forest | Acetobacteraceae | 0 |
F10 | Urban | Bacillaceae | 0.00488836 |
F2 | Urban | Bacillaceae | 0.000924825 |
F3 | Urban | Bacillaceae | 0.001056943 |
F7 | Urban | Bacillaceae | 0.002378121 |
F8 | Urban | Bacillaceae | 0.002906593 |
W10 | Forest | Bacillaceae | 0.000264236 |
W13 | Forest | Bacillaceae | 0.027876866 |
W3 | Forest | Bacillaceae | 0.001585414 |
W6 | Forest | Bacillaceae | 0.001056943 |
W9 | Forest | Bacillaceae | 0.004492007 |
F10 | Urban | Carnobacteriaceae | 0 |
F2 | Urban | Carnobacteriaceae | 0 |
F3 | Urban | Carnobacteriaceae | 0 |
F7 | Urban | Carnobacteriaceae | 0 |
F8 | Urban | Carnobacteriaceae | 0.000132118 |
W10 | Forest | Carnobacteriaceae | 0 |
W13 | Forest | Carnobacteriaceae | 0 |
W3 | Forest | Carnobacteriaceae | 0.000132118 |
W6 | Forest | Carnobacteriaceae | 0 |
This question should be in cross-validated.
If you want to know whether the the growth is varying with Family, irrespective of the Habitat, you can perform kruskal.test with agg_rel_abund as dependent variable and Family as independent variable.
kruskal.test(agg_rel_abund ~ Habitat, data = my_data)
Kruskal-Wallis rank sum test
data: agg_rel_abund by Habitat
Kruskal-Wallis chi-squared = 0.0051556, df = 1, p-value = 0.9428
If you are sure that there is no difference in growth across different families, you can directly perform kruskal.test with agg_rel_abund as dependent variable and Habitat as independent variable.
kruskal.test(agg_rel_abund ~ Habitat, data = my_data)
Kruskal-Wallis rank sum test
data: agg_rel_abund by Habitat
Kruskal-Wallis chi-squared = 0.0051556, df = 1, p-value = 0.9428
For each habitat, you can perform kruskal.test to check the significant of difference in growth among families
library(dplyr)
for (i in unique(family$Habitat)) {
x <- kruskal.test(agg_rel_abund ~ family,
data = family[family$Habitat==i,])
out[[i]] <- c(Kruskal.Wallis.H = x[["statistic"]][["Kruskal-Wallis chi-squared"]],
Sig = x[["p.value"]],
df = x[["parameter"]][["df"]])
}
out <- bind_rows(out)
out$Habitat <- unique(family$Habitat)