I have measured the binding affinity of several biomolecules to each other (a ligand and a target). This ideally results in a sigmoidal curve of the data. The inflection point represents the Kd value (dissociation constant), which is an indicator for the binding strength between said molecules.
The x axis is the ligand concentration and the y axis is the observed response, Fnorm or other measure (usually a number between 1 and 1000). This attached image shows what it looks like:
I have no issue plotting my data points in ggplot2
and actually getting a decent graph. However, I also need the actual sigmoidal curve going through the data points (with a 95% CI around it).
In some cases the ligand can't be concentrated any higher, meaning that a part of one of the "plateaus" of the curve is missing. I would like to have a sigmoidal curve that tries to "guess" or estimate what the full data distribution looks like, even if a small part of it is missing. This attached image shows what I mean by that:
Notice how the lower right part of the curve doesn't just end with the data points. But instead it "approximates" or "guesses" the rest of the sigmoidal curve from the last data points.
Therefore, I would also like to have some kind of goodness-of-fit measure I can show, to let people know how well this sigmoidal model fits/describes the data. Not sure if something like the McFaddens´s R2 value would be good, but something like this.
Finally, I of course also need the inflection point as an output (= a value on the x axis).
I have found somewhat similar questions, but they are not exactly what I need and I have failed adapting the solutions to me:
This is the dummy example data of how a triplicate affinity measurement might look like:
ligand-conc (x): 5.289E-09, 1.058E-08, 2.115E-08, 4.231E-08, 8.462E-08, 1.692E-07, 3.385E-07, 6.769E-07, 1.354E-06, 2.708E-06, 5.415E-06, 1.083E-05, 2.166E-05, 4.332E-05, 8.665E-05, 1.733E-04
Fnorm exp.a (y): 792.6444, 792.8537, 788.0273, 793.9693, 792.3848, 792.311, 790.5109, 790.4974, 796.1723, 790.8627, 790.2954, 784.7171, 773.0447, 760.8085, 745.5512, 738.3463
Fnorm exp.b (y): 790.2453, 793.8565, 789.5286, 791.8368, 788.5138, 790.0382, 792.85, 789.1439, 790.3487, 792.1872, 786.6738, 780.0627, 775.8658, 762.8376, 747.4288, 737.8717
Fnorm exp.c (y): 788.2453, 790.5648, 792.8529, 790.1368, 793.5138, 791.7038, 788.85, 791.1439, 789.4487, 788.8872, 789.5674, 783.3063, 774.8658, 764.5838, 749.4288, 736.5872
Here is what it looks like in an excel format: excel file
This is the code I have used thus far:
mydata <- read.csv("example")
names(mydata)[names(mydata) == "ligand.conc"] <- "ligand" #different name of a column for convenience
mydata$ligand <- as.numeric(mydata$ligand)
mydata$ligand <- mydata$ligand*1000 #changing the unit of the concentration from M to mM
mydata$ligand <- mydata$ligand*1000 #changing the unit of the concentration from mM to µM
mydata$Fnorm <- as.numeric(mydata$Fnorm)
base = ggplot(mydata)
base +
geom_point(aes(x=concentration, y=Fnorm, color=experiment))+
geom_smooth(aes(x=ligand, y=Fnorm),
method = drm, method.args = list(fct = L.4()), se = FALSE)+
theme_bw() +
theme(
axis.line.x.bottom = element_line(color = 'black'),
axis.line.y.left = element_line(color = 'black'),
axis.line.y.right = element_line(color = 'black'),
panel.grid.minor.x = element_blank(),
panel.border = element_blank(),
axis.title.x = element_markdown(),
axis.title.y = element_markdown(),
axis.minor.ticks.length = rel(1),
axis.text = element_text(color = "black",
size = 10),
axis.ticks=element_line(linewidth=0.6),
axis.ticks.length = unit(2.75, "pt"),
) +
scale_color_manual(
name="Replicate",
labels = c("1", "2", "3"),
values = c("sienna1", "dodgerblue","grey43")) +
coord_cartesian(ylim = c(720, 800), expand = TRUE)+
scale_x_continuous(trans="log10",
expand = c(0, 0),
label = label_number(),
breaks = c(0.01, 0.1, 1, 10, 100, 1000),
guide = guide_axis_logticks(long = 2.3, mid = 1.65, short = 0.75),
limits = c(0.001,1000))+
labs(
y="Fnorm [%<sub>280</sub>]",
x="Ligand conc. [µM]"
)+
print(base)
This yields the following plot:
The curve appears very rough/jagged and not very smooth. As described above, the curve is incomplete as it would have to guess what the lower right part looks like. It is also missing the 95% CI. Further, I am not sure how to get the inflection point and goodness-of-fit outputs.
As of yet I have used the drc
package (as described here), but this calculates a so-called ED50 value; I am unsure whether this equals the inflection point and thus Kd.
In order to get the 95% CI around the curve, I have tried to replace
geom_smooth(aes(x=ligand, y=Fnorm),
method = drm, method.args = list(fct = L.4()), se = FALSE)+
with
geom_smooth(aes(x=ligand, y=Fnorm),
method = drm, method.args = list(fct = L.4()), level=0.95)+
however, this produces this error:
geom_smooth() using formula = 'y ~ x' Failed to fit group -1. Caused by error in pred$fit: $ operator is invalid for atomic vectors
I am also unsure why I have to specify aes (x, y)
twice in a row in the same ggplot. Adding geom_smooth()
to the plot, without the aes()
part, returns the error:
stat_smooth() requires the following missing aesthetics: x and y.
Adding geom_smooth(aes(x,y))
even just a little further down in the code produces:
Failed to fit group -1. Caused by error in method(): Convergence failure: singular convergence (7)"
Here's what I have so far. While you can use fitting methods like drc::drm()
within ggplot
, it's usually easier (in my experience) to apply them externally, then feed the results into ggplot
.
lvec <- c(5.289E-09, 1.058e-08, 2.115e-08, 4.231e-08, 8.462e-08, 1.692e-07,
3.385e-07, 6.769e-07, 1.354e-06, 2.708e-06, 5.415e-06, 1.083e-05,
2.166e-05, 4.332e-05, 8.665e-05, 1.733e-04)
expa <- c(792.6444, 792.8537, 788.0273, 793.9693, 792.3848, 792.311, 790.5109, 790.4974, 796.1723, 790.8627, 790.2954, 784.7171, 773.0447, 760.8085, 745.5512, 738.3463)
expb <- c(790.2453, 793.8565, 789.5286, 791.8368, 788.5138, 790.0382, 792.85, 789.1439, 790.3487, 792.1872, 786.6738, 780.0627, 775.8658, 762.8376, 747.4288, 737.8717)
expc <- c(788.2453, 790.5648, 792.8529, 790.1368, 793.5138, 791.7038, 788.85, 791.1439, 789.4487, 788.8872, 789.5674, 783.3063, 774.8658, 764.5838, 749.4288, 736.5872)
dd <- data.frame(conc=lvec*1000,
Fnorm = c(expa, expb, expc),
experiment = rep(c("a","b","c"), each = 16))
library(drc)
fit1 <- drm(Fnorm ~ conc, data = dd, fct = LL.4())
## explicitly ask for predictions over a wider range than the data:
## these values specify the range of the predictions
minval <- 4e-6; maxval <- 10
pframe <- data.frame(
conc = exp(seq(log(minval), log(maxval), length.out = 101)))
pframe <- cbind(pframe,
predict(fit1, newdata = pframe, se.fit = TRUE))
names(pframe)[2:3] <- c("Fnorm", "Fnorm_se")
library(ggplot2)
ggplot(dd, aes(conc, Fnorm)) +
geom_point(aes(colour = experiment)) +
geom_line(data = pframe, colour = "black") +
geom_ribbon(data = pframe,
colour = NA, ## no line at edges of ribbon
fill = "black", ## colour of ribbon
alpha = 0.1, ## make ribbon semi-transparent
aes(ymin = Fnorm - 2*Fnorm_se,
ymax = Fnorm + 2*Fnorm_se)) +
scale_x_log10() +
geom_hline(yintercept = coef(fit1)[2], lty = 2) +
geom_vline(xintercept = coef(fit1)[4], lty = 2)
The inflection point is coef(fit1)[4]
= 0.0494.
Note these are ± 2 SE confidence intervals but you're welcome to use ± 1.96 SE (or some appropriate t-distribution-based interval) instead.
Not sure offhand about the goodness-of-fit statistic, although summary()
does give you the residual standard error (from which you should be able to compute an R^2 value ...) Or compute cor(predict(fit1), dd$conc)^2
= 0.902.