trying to submit an article with statistical data and figures. I like R and used it for that. I wrote this one
graph_a=
ggplot(df, aes(x = group, y = squareBowmansCapsule, fill = isInfected)) +
geom_boxplot() +
xlab(label = "Groups") +
ylab(label = "Bowman's capsule area sq.µm") +
scale_fill_discrete(name = "") +
theme(axis.text.x = element_text(size = 10),axis.title.x = element_text(size = 10)) +
theme(axis.text.y = element_text(size = 10),axis.title.y = element_text(size = 10))+
theme(axis.text.x = element_blank())+
labs (title = "Bowman's capsule area") +
theme(legend.position = c(0.1, 1),
legend.direction = "vertical")
graph_a1 = graph_a +
annotate(
"text",
x = c(1, 2, 3, 4),
y = -1.5,
label = c("Control", "1 group", "2 group", "3 group")
)
graph_a_with_pValue1=add_pval(graph_a1,
pairs = list(c(1, 2),c(1,3),c(1,4)
),
test='kruskal.test',heights=c(14000,16000,18500))
and got the figure like this: figure
editors remarks were: 1 Please use commas to separate thousands for numbers with five or more digits (not four digits) in the picture, e.g., “10000” should be “10,000”. 2.Please change the terms into scientific notations in the figure, e.g., “2 × 10−16”, not “2e−16”. 3.Please change P in lower case.
I solved the problem with the following steps
scale_y_continuous(labels = c("0", "5000", "10,000", "15,000", "20,000"))
then, mannualy added annotations
pval_annotations = c("'p = 2 × 10⁻¹³'",
"'p < 2 × 10⁻¹⁶'",
"'p = 4.7 × 10⁻¹¹'")
graph_a_with_pValue = add_pval(
graph_a1,
textsize = 8,
annotation = pval_annotations,
pairs = list(c(1, 2), c(1, 3), c(1, 4)),
heights = c(14000, 16000, 18500)
)
finally, I got this code
graph_a =
ggplot(df, aes(x = group, y = squareBowmansCapsule, fill = isInfected)) +
geom_boxplot() +
xlab(label = "Groups") +
ylab(label = "Bowman's capsule area sq.µm") +
scale_fill_discrete(name = "") +
theme(
axis.text.x = element_text(size = 12, face = "bold"),
axis.title.x = element_text(size = 14, face = "bold")
) +
theme(
axis.text.y = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold")
) +
scale_y_continuous(labels = c("0", "5000", "10,000", "15,000", "20,000")) +
theme(axis.text.x = element_blank()) +
labs (title = "Bowman's capsule area") +
theme(legend.position = c(0.1, 1),
legend.direction = "vertical")
graph_a1 = graph_a +
annotate(
"text",
x = c(1, 2, 3, 4),
y = -1.5,
label = c("Control", "1 group", "2 group", "3 group")
)
pval_annotations = c("'p = 2 × 10⁻¹³'",
"'p < 2 × 10⁻¹⁶'",
"'p = 4.7 × 10⁻¹¹'")
graph_a_with_pValue = add_pval(
graph_a1,
textsize = 8,
annotation = pval_annotations,
pairs = list(c(1, 2), c(1, 3), c(1, 4)),
heights = c(14000, 16000, 18500)
)
and that figureresult
My question is: how to get the same result without so much effort?
Two helper functions, both using scales::
as a starter:
mycomma <- function(z) {
out <- scales::label_comma()(z)
sub("^([0-9]),([0-9]{3})$", "\\1\\2", out)
}
myscientific <- function(z) {
out <- scales::label_scientific()(z)
out <- parse(text = sub("e", "%*% 10^", out))
out[abs(z) < 1e-99] <- "0" # otherwise we see "0 x 10^+0"
out
}
The use of abs(z) < 1e-99
may be sensitive to your actual data. The intent of that vice z == 0
is to work around Why are these numbers not equal? (and R FAQ 7.31), especially knowing we're dealing with high-precision near-zero numbers. With this sample data (in my hasty testing), it worked as close to zero as abs(z) < 1e-323
, while 1e-324
showed 0 x 10^+0
.
Sample data:
dat <- data.frame(x=seq(2000, 15000, length.out = 5), y=seq(2e-10, 2e-5, length.out = 5))
A plot, using our two helper functions in the labels=
argument for each axis:
library(ggplot2)
ggplot(dat, aes(x, y)) +
geom_point() +
scale_x_continuous(labels = mycomma) +
scale_y_continuous(labels = myscientific)