I have the following dataframe df:
df <- data.frame(
beta = c(0.45, -0.12, 0.33, -0.07, 0.21, 0.65, -0.18, 0.09),
se = c(0.05, 0.03, 0.04, 0.02, 0.06, 0.07, 0.03, 0.05),
prs_trait1 = c(
"Rose Growth Index",
"Tulip Petal Width",
"Sunflower Height",
"Daisy Bloom Count",
"Orchid Stem Thickness",
"Lily Leaf Area",
"Carnation Bud Count",
"Violet Flower Density"
),
OR_CI_text = c(
"1.57(1.46-1.70)",
"0.89(0.84-0.94)",
"1.39(1.28-1.52)",
"0.93(0.90-0.96)",
"1.23(1.15-1.32)",
"1.92(1.72-2.15)",
"0.83(0.79-0.87)",
"1.10(1.02-1.18)"
),
N = c(1200, 950, 1350, 890, 1020, 1600, 800, 1150) # Number of observations
)
I wrote the code for this forest plot:
# Create the forest plot
ggforestplot::forestplot(
df = df,
name = trait,
estimate = beta,
se = SE,
xlab = "Odds ratio (95% CI)",
title = NULL,
grid = FALSE
)
Now, I would like to add OR_CI_text on the right of this forest plot, while the trait and the N on the left like in the attached figure:
How can I adjust my R code? Thanks!
Although this proposed solution is base R and not ggplot2
or forestplot
, I first learned on base R plotting so personally more familiar with base's control over the exact placement of everything - ggplot
also is great and gives plenty control as well, so I'm sure there is a way using these external packages.
First, while you can transform your beta and SE values to the OR and confidence intervals, I'm just going to extract them from the text values in df$OR_CI_text
and keep them in a reference matrix called plotvals
:
# extract point and lo/hi values from text
plotvals <- do.call(rbind,
lapply(regmatches(df$OR_CI_text, gregexpr("([0-9.]+)", df$OR_CI_text)),
as.numeric))
# [,1] [,2] [,3]
# [1,] 1.57 1.46 1.70
# [2,] 0.89 0.84 0.94
# [3,] 1.39 1.28 1.52
# [4,] 0.93 0.90 0.96
# [5,] 1.23 1.15 1.32
# [6,] 1.92 1.72 2.15
# [7,] 0.83 0.79 0.87
# [8,] 1.10 1.02 1.18
This gives us the raw values we want to plot.
Now we can build the plot manually. If you're unfamiliar with base R plotting, I would suggest running this line-by-line to see what each command is adding to the plot as we build it:
# Set some global paramters
seqx <- 2^c(-3:3) # for OR axis labels
xx <- range(0.0005, 1e2) # overall x-axis size
yy <- seq_len(nrow(df)) # placement of graphics and text on y axis
# Initiate blank plot
plot(x = log(xx),
y = c(0.9, nrow(df)+0.1),
type = "n", axes = FALSE,
xlab = NA, ylab = NA)
# Add in axes and labels
axis(1, at = log(seqx), labels = seqx)
axis(3, at = log(seqx), labels = seqx)
mtext(side = 1, "Odds Ratio", at = 0, padj = 4)
mtext(side = 3, "Odds Ratio", at = 0, padj = -4)
# Add in colored bars behind everything else
rect(xleft = log(xx[1]), xright = log(xx[2]),
ybottom = yy[c(TRUE, FALSE)]-0.5, ytop = yy[c(FALSE, TRUE)]-0.5,
col = "lightgrey", border = NA)
# Add reference line
abline(v = log(1), lty = 3, lwd = 0.75)
# Add in OR and CI points and lines
segments(x0 = log(plotvals[,2]),
x1 = log(plotvals[,3]),
y0 = yy)
points(x = log(plotvals[,1]),
y = yy, pch = 22, bg = "maroon")
# Add in text
text(df$prs_trait1, x = log(2^-5), y = yy, pos = 2, xpd = TRUE) # covariate
text(df$N, x = log(2^-4), y = yy, pos = 2, xpd = TRUE) # n
text(df$OR_CI_text, x = log(2^6), y = yy, pos = 2) # OR (CI)
text(c("Covariate", "n", "OR (95% CI)"), # top labels
x = log(2 ^ c(-5, -4, 6)),
y = max(yy) + 1,
xpd = TRUE, pos = 2)
The final plot looks like this:
This is a relative basic plot - of course you can play around with the different parameters to fine tune it to exactly what you'd want it to look like. Good luck!
Personal note: if the lack of space between the point estimate and parentheses of the CI is bugging you like it is me, you can replace the relevant line of text with:
text(gsub("\\(", " \\(", df$OR_CI_text), x = log(2^6), y = yy, pos = 2) # OR (CI)