I am trying to compute a log-likelihood, where the likelihood is a convolution of two distributions, and the distribution parameters depend on the value of that data point. More formally, I believe my data follows a likelihood which is the sum of two gamma distributions, which have separate shape and scale parameters and the shape parameter is a function of data. Written mathematically, yi=pGamma(k1+b1 xi, t1)+(1-p)Gamma(k2+b2 xi, t2). Note the plus sign is an arithmetic operation on the random variables, not between the CDFs. This is not a mixture model. The likelihood is a convolution between weighted random variables that have to be done per observation, which is very slow.
I currently am using the distr package, and this is what I have to compute the likelihood.
sapply(1:10000, function(i){distr::d(p*Gammad(shape = k1+b1*x[i], scale = t1)+(1-p)*Gammad(shape = k2+b2*x[i], scale = t2))(y[i])})
Is there a way to vectorize this? Is there any other way to speed up the code?
When I try
distr::d(Vectorize(p*Gammad(shape = k1+b1*x, scale = t1)+(1-p)*Gammad(shape = k2+b2*x, scale = t2)))(y)
I get the error "Error in .multm(e1, e2, "AbscontDistribution") : length of operator must be 1"
First, note that if
A ~ Gamma(k1 + b1*xi, t1)
B ~ Gamma(k2 + b2*xi, t2)
then
p*A ~ Gamma(k1 + b1*xi, p*t1)
(1 - p)*B ~ Gamma(k2 + b2*xi, (1 - p)*t2)
The closed-form PDF of the sum of two gamma random variates is known and involves Kummer's (confluent hypergeometric) function.
Using chf_1F1
from the scModels
package, I believe the following function would return the log-likelihoods.
library(scModels)
LL <- function(k1, b1, t1, k2, b2, t2, x, p, y) {
a1 <- k1 + b1*x
a2 <- k2 + b2*x
a <- a1 + a2
b1 <- 1/(p*t1)
b2 <- 1/((1 - p)*t2)
a1*log(b1) + a2*log(b2) - lgamma(a) - b1*y + (a - 1)*log(y) + chf_1F1((b1 - b2)*y, a2, a)
}