(see image in link for better explanation)
Trying to plot a log boxplot. I am very new to R and have tried to read tutorials but they all seem to use a different plotting function?
1/ I would like to know how to change y-axis values (i.e. to 0.001, 0.01, 0.1, 1 etc.) whilst retaining log scale?
2/ I would also like to know how to overlay a scatter plot of the data over the box?
3/ Finally, advice on how to add gridlines and border, of chosen weight and colour, and axis titles would be great?
So far, only code used is:
boxplot(box,
varwidth = TRUE, log = "y", las = 1)
Sorry it's so obvious but thanks guys!
Reproducible: (first 30 data point)
structure(list(CD = c(0.291998350286, 58.4266839332, 1.27227891359,
7.05106388302, 0.000175203165079, 14.5665189804, 0.991317477169,
1.56817217741, 30.4733699427, 0.421737157934, 1.42372160368,
0.333712081068, 0.126643859356, 0.339337851064, 0.151788605996,
3.81711532569, 1.54344215823, 17.2540240816, 3.67548135199, 4.08331544672,
0.0549081111653, 0.0734888395127, 5.16751927204, 22.6971132167,
1.04321972985, 0.184343635879, 2.29291935133, 0.0555342051937,
0.411328596454, 51.3157360015), WD = c(0.402162969955, 0.189544929529,
0.000840280055822, 0.0501429051167, 3.4853343866, 0.0286017538011,
0.0121948073037, 0.992426638872, 0.0192559537415, 0.00398698494632,
0.888543226817, 0.703331842713, 0.378008558951, 4.70639786908,
0.113706495683, 1.32546254378, 0.936899368015, 0.108969215053,
0.25593198462, 0.564518000036, 0.121389166752, 0.195884521759,
0.704964462359, 1.25602965005, 0.0242662609253, 2.11883481514,
0.44581781826, 0.659586439033, 0.36869665263, 0.824802234027),
MC = c(0.0817800846374, 1.70562818122, 0.0807325401412, 0.180484111266,
0.0438908620273, 8.75617400342, 0.479370274286, 0.908307567192,
2.81446961622, 0.0699990348088, 0.0491805903311, 0.00573142245572,
0.116352754956, 0.311847695137, 0.0414215549125, 0.104499713126,
0.0551723673287, 0.076199002014, 0.191940770942, 4.11745930602,
1.75751348869, 0.0517694407553, 2.29459310871, 0.0269233884783,
0.097992042257, 11.7325079183, 0.262543381616, 0.748125397347,
0.635821595694, 0.794256126423), WC = c(0.0686062258206,
0.514240129693, 7.68226019254, 4.36776848419, 0.618214352027,
2.13911888244, 0.0392505689889, 0.0823059942863, 2.36466448826,
0.0688590035687, 0.151457824484, 0.260629997743, 8.30460664472,
0.235838508742, 0.41960151168, 4.38818043685, 0.0797918590848,
0.109025596179, 0.0837286212892, 0.0117251770506, 1.17739717792,
0.207413909376, 8.62180088733, 2.33021344099, 0.166981061366,
1.13410263425, 0.0905601584251, 0.154075808752, 0.140498581833,
0.213863468391), MWC = c(301.891645135, 0.672405306137, 0.105110378336,
5.36947765018, 0.672138277335, 3.58296467263, 10.7754596083,
5.01795685162, 0.0775842457366, 1.07683084271, 1.0360624974,
16.8763517534, 0.390002867544, 1.50618637339, 0.371973397842,
1.28366689573, 0.0633246500391, 0.0364964802158, 0.249895194073,
0.0379084221473, 0.0798275709535, 0.504735639066, 8.12262202509,
82.5787360252, 0.068574731873, 8.76779568117, 0.00873932360562,
0.0142029221366, 0.0228083224849, 0.146073745479)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -30L))
>
Lots of questions in one here, which really boil down to "how to use ggplot2". Here's a good introductory guide.
First, your data are in "wide" format, ggplot2
works better with "long" format (one column for data names, one for their values). We can use tidyr::pivot_longer()
for that. By default it generates new columns name
and value
.
For a boxplot we use geom_boxplot()
. By "scatter plot" I think you mean "jitter plot", which is the usual way to overlay individual data points on a boxplot. The appropriate function is geom_jitter()
.
Labels for y-axis values can be altered in several different ways. One is to use functions from the scales
package. Another is to supply a labelling function - see the code below.
Axis titles can be added using the labs()
function.
Gridlines and border of chosen weight and color: well, it depends what you want exactly, but in general you would use theme()
and look for arguments related to panel
. In the example code below we add a thick red border.
So putting all of that together:
library(ggplot2)
library(tidyr)
library(dplyr)
box %>%
pivot_longer(everything()) %>%
ggplot(aes(name, value)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.2) +
scale_y_log10(labels = function(x) format(x, scientific = FALSE)) +
theme_bw() +
theme(panel.border = element_rect(fill = NA, color = "red", size = 2)) +
labs(x = "Group", y = "Value")
Result. Hope that helps you to get started.