Suppose there are three binomial experiments conducted chronologically. For each experiment, I know the #of trials as well as the #of successes. To use the first two older experiments as prior for the third experiment, I want to "fit a Bayesian hierarchical model on the two older experiments and use the posterior form that as prior for the third experiment".
Given my available data (below), my question is: is my rstanarm
code below capturing what I described above?
Study1_trial = 70
Study1_succs = 27
#==================
Study2_trial = 84
Study2_succs = 31
#==================
Study3_trial = 100
Study3_succs = 55
What I have tried in package rstanarm
:
library("rstanarm")
data <- data.frame(n = c(70, 84, 100), y = c(27, 31, 55));
mod <- stan_glm(cbind(y, n - y) ~ 1, prior = NULL, data = data, family = binomial(link = 'logit'))
## can I use a beta(1.2, 1.2) as prior for the first experiment?
TL;DR: If you were directly predicting the probability of success, the model would be a Bernoulli likelihood with parameter theta (the probability of success) that could take on values between zero and one. You could use a Beta prior for theta in this case. But with a logistic regression model, you're actually modeling the log odds of success, which can take on any value from -Inf to Inf, so a prior with a normal distribution (or some other prior that can take on any real value within some range determined by the available prior information) would be more appropriate.
For a model where the only parameter is the intercept, the prior is the probability distribution for the log odds of success. Mathematically, the model is:
log(p/(1-p)) = a
Where p
is the probability of success and a
, the parameter you're estimating, is the intercept, which can be any real number. If the odds of success are 1:1 (that is, p = 0.5) then a = 0
. If the odds are greater than 1:1 then a
is positive. If the odds are less than 1:1 then a
is negative.
Since we want a prior for a
, we need a probability distribution that can take on any real value. If we didn't know anything about the odds of success, we might use a very weakly informative prior like a normal distribution with, say, mean=0 and sd=10 (this is the rstanarm
default), meaning that one standard deviation would encompass odds of success ranging from about 22000:1 to 1:22000! So this prior is essentially flat.
If we take your first two studies to construct the prior, we can use the probability density based on those studies and then transform it to the log odds scale:
# Possible outcomes (that is, the possible number of successes)
s = 0:(70+84)
# Probability density over all possible outcomes
dens = dbinom(s, 70+84, (27+31)/(70+84))
Assuming we'll use a normal distribution for the prior, we want the most likely probability of success (which will be the mean for the prior) and the standard deviation of the mean.
# Prior parameters
pp = s[which.max(dens)]/(70+84) # most likely probability
psd = sum(dens * (s/max(s) - pp)^2)^0.5 # standard deviation
# Convert prior to log odds scale
pp_logodds = log(pp/(1-pp))
psd_logodds = log(pp/(1-pp)) - log((pp-psd)/(1 - (pp-psd)))
c(pp_logodds, psd_logodds)
[1] -0.5039052 0.1702006
You could generate essentially the same prior by running stan_glm
on the first two studies with the default (flat) prior:
prior = stan_glm(cbind(y, n-y) ~ 1,
data = data[1:2,],
family = binomial(link = 'logit'))
c(coef(prior), se(prior))
[1] -0.5090579 0.1664091
Now let's fit the model using data from Study 3 using the default prior and the prior we just generated. I've switched to a standard data frame, since stan_glm
seems to fail when the data frame has only one row (as in data = data[3, ]
).
# Default weakly informative prior
mod1 <- stan_glm(y ~ 1,
data = data.frame(y=rep(0:1, c(45,55))),
family = binomial(link = 'logit'))
# Prior based on studies 1 & 2
mod2 <- stan_glm(y ~ 1,
data = data.frame(y=rep(0:1, c(45,55))),
prior_intercept = normal(location=pp_logodds, scale=psd_logodds),
family = binomial(link = 'logit'))
For comparison, let's also generate a model with all three studies and the default flat prior. We would expect this model to give virtually the same results as mod2
:
mod3 <- stan_glm(cbind(y, n - y) ~ 1,
data = data,
family = binomial(link = 'logit'))
Now let's compare the three models:
library(tidyverse)
list(`Study 3, Flat Prior`=mod1,
`Study 3, Prior from Studies 1 & 2`=mod2,
`All Studies, Flat Prior`=mod3) %>%
map_df(~data.frame(log_odds=coef(.x),
p_success=predict(.x, type="response")[1]),
.id="Model")
Model log_odds p_success 1 Study 3, Flat Prior 0.2008133 0.5500353 2 Study 3, Prior from Studies 1 & 2 -0.2115362 0.4473123 3 All Studies, Flat Prior -0.2206890 0.4450506
For Study 3 with the flat prior (row 1), the predicted probability of success is 0.55, as expected, since that's what the data says and the prior provides no additional information.
For Study 3 with a prior based on studies 1 and 2, the probability of success is 0.45. The lower probability of success is due to the lower probability of success in Studies 1 and 2 adding additional information. In fact, the probability of success from mod2
is exactly what you'd calculate directly from the data: with(data, sum(y)/sum(n))
. mod3
puts all the information into the likelihood instead of splitting it between the prior and the likelihood, but is otherwise essentially the same as mod2
.
Answer to (now deleted) comment: If all you know is the number of trials and successes and you think that a binomial probability is a reasonable model for how the data were generated, then it doesn't matter how you split up the data into "prior" and "likelihood" or whether you shuffle the order of the data. The resulting model fit will be the same.