I am plotting some plant and soil data along a transect covering the sampling years 2000-2020, testing simple one-way ANOVA with the excellent agricolae package, and doing simple explorative base plot graphs.
However, I find it difficult to manipulate with the output from the agricolae package and the base plot function, and I would now like to make a nice (preferably ggplot) graph with data in proper order in presentation quality.
My data (named "biochemA") looks like this:
Year Transect Depth NH4_(ug/L)
2000 1 1 391,777
2009 1 1 471,616
2011 1 1 362,964
2013 1 1 348,208
2015 1 1 234,341
2017 1 1 768,844
2019 1 1 599,063
2020 1 1 912,435
2000 2 1 452,272
2009 2 1 391,134
2011 2 1 285,286
2013 2 1 755,01
2015 2 1 376,022
2017 2 1 1205,095
2019 2 1 2940,163
2020 2 1 298,487
2000 3 1 1409,322
2009 3 1 847,658
2011 3 1 332,635
2013 3 1 487,695
2015 3 1 337,721
2017 3 1 1702,21
2019 3 1 2684,409
2020 3 1 448,644
modelNH4<-aov(biochemA$NH4_(ug/L) ~ Year, data = biochemA) out_NH4 <- HSD.test(modelNH4,"Year", group=TRUE,console=TRUE) plot(out_NH4,main="NH4", xlab="Year", ylab="μg/g soil", las=2)
This gives me a graph like this:
which is fine for exploratory purposes. However, I would like to plot it with the years on the x-axis in chronological order (2020-2000), whereas the base plot function automatically displays the graphs by decreasing means. Also I would like to show standard error and not standard deviations.
The output from the HSD tst looks like this:
> out_bioANH4
$statistics
MSerror Df Mean CV MSD
173.8115 38 11.83804 111.3678 26.72761
$parameters
test name.t ntr StudentizedRange alpha
Tukey Year 8 4.53321 0.05
$means
biochemA1$NH4_.ug.g_soil._ std r Min Max
2000 7.662795 6.182201 5 3.000779 18.067119
2009 6.314628 3.589682 5 2.905835 12.042757
2011 4.866242 1.180330 5 3.698008 6.192181
2013 6.401148 3.330437 5 3.628023 10.752502
2015 4.286512 1.424005 5 2.777662 6.101243
2017 14.385232 8.981692 5 6.405698 29.624701
2019 42.317093 18.145402 5 12.244628 57.310143
2020 8.470637 4.223832 5 4.573570 15.409000
Q25 Q50 Q75
2000 3.547272 5.327341 8.371463
2009 3.728181 6.008561 6.887809
2011 4.097738 4.257121 6.086162
2013 3.684275 4.766714 9.174225
2015 3.089635 4.135425 5.328595
2017 9.569202 12.620927 13.705629
2019 38.758582 50.761796 52.510317
2020 5.761486 7.738720 8.870408
$comparison
NULL
$groups
biochemA1$NH4_.ug.g_soil._ groups
2019 42.317093 a
2017 14.385232 b
2020 8.470637 b
2000 7.662795 b
2013 6.401148 b
2009 6.314628 b
2011 4.866242 b
2015 4.286512 b
attr(,"class")
[1] "group"
What you'll probably have to do is take the elements from out_bioANH4
and use them in your custom plot, such as ggplot2
.
If you type out_bioANH4
at the prompt, you'll see that out_bioANH4$means
contains a matrix of the descriptive statistics of your data. If you use ggplot2
, you can get similar plots as the agricolae
package using geom_errorbar
.
See what this gives you:
library(ggplot2)
#this calculates SE from the summary statistics and adds it to the summary matrix
out_bioANH4$means$se <- out_bioANH4$means$std/sqrt(out_bioANH4$means$r)
ggplot(data=out_bioANH4$means) +
#this calculates ymin and ymax, which are the location of mean +/- SE
geom_errorbar(aes(ymin=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._ - out_bioANH4$means$se,
ymax=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._ + out_bioANH4$means$se,
x=row.names(out_bioANH4$means), color=out_bioANH4$groups$groups), width=0) +
#this puts points at the means of the graph and does a different color for each
geom_point(aes(y=out_bioANH4$means$biochemA1$NH4_.ug.g_soil._,
x=row.names(out_bioANH4$means),
color=out_bioANH4$groups$groups[order(row.names(out_NH4$groups))])) +
xlab("Year") +
ylab("NH4_(ug/L)") +
scale_colour_discrete(name="Group")
I couldn't reproduce your results with HSD using your raw data in your "biochemA" example. If your out_bioANH4
example is correct, this ggplot2
stuff should get you started.
Because the x
in the geom_errorbar
is in numerical order, the x axis should be ordered in increasing order. If you want to reverse it, you can do that easily with ggplot
.