Optimize a common dispersion parameter across groups of data over fixed mu, where mu changes by group.
I have n=10
groups of data, each of which I assume is a random sample from a negative binomial distribution with fixed $mu=m_i$.
# Simulate data in hand:
# N=10 vectors of variable length
# fixed data:
N <- 10
ns <- sample(1:100, N)
# fixed means:
mus <- sample(1:100, N)
list_v <- lapply(1:N, function(i){ rnbinom(ns[i], size=3, mu=mus[i]) })
list_mu <- as.list(mus)
I'd like to estimate a dispersion (size
) parameter that is common to all groups. So I'd like to optimize the joint maximum likelihood over the size
parameter. I wrote a function negjloglik_nbinom
that can handle the varying mu
parameters:
# Define functions:
loglik_nbinom <- function(v, size, m){
log(dnbinom(v, mu=m, size=size))
}
neg_jloglik_nbinom <- function(disp, v_list, mu_list){
# Likelihood for mean m
ind_lls <- list()
for(i in 1:length(v_list)){
ind_lls[[i]] <- loglik_nbinom(size=disp, v=v_list[[i]], m=mu_list[[i]])
}
# log(product(likelihoods)) == sum(log(likelihoods))
(-1)*sum(unlist(ind_lls))
}
which I then attempt to pass to bbmle::mle2
:
fit <- bbmle::mle2(minuslogl=neg_jloglik_nbinom,
start=c(disp=3),
fixed=list(v_list=list_v,
mu_list=list_mu)
)
And this throws a strange error:
Error in bbmle::mle2(minuslogl = neg_jloglik_nbinom, start = c(disp = 3), :
some named arguments in 'fixed' are not arguments to the specified log-likelihood function:v_list1, v_list2, v_list3, v_list4, v_list5, v_list6, v_list7, v_list8, v_list9, v_list10, v_list11, v_list12, v_list13, v_list14, v_list15, v_list16, v_list17, v_list18, v_list19, v_list20, v_list21, v_list22, v_list23, v_list24, v_list25, v_list26, v_list27, v_list28, v_list29, v_list30, v_list31, v_list32, v_list33, v_list34, v_list35, v_list36, v_list37, v_list38, v_list39, v_list40, v_list41, v_list42, v_list43, v_list44, v_list45, v_list46, v_list47, v_list48, v_list49, v_list50, v_list51, v_list52, v_list53, v_list54, v_list55, v_list56, v_list57, v_list58, v_list59, v_list60, v_list61, v_list62, v_list63, v_list64, v_list65, v_list66, v_list67, v_list68, v_list69, v_list70, v_list71, v_list72, v_list73, v_list74, v_list75, v_list76, v_list77, v_list78, v_list79, v_list80, v_list81, v_list82, v_list83, v_list84, v_list85, v_list86, v_list87, v_list88, v_list89, v_list90, v_list91,
The optimization doesn't end up being a problem if v_list
and mu_list
are not passed as function arguments, and instead neg_jloglik_nbinom
finds them in the environment. This doesn't seem ideal but I'll live with it if I have to!
# Rewrite objective function without list args:
neg_jloglik_nbinom <- function(disp){
# Likelihood for mean m
ind_lls <- list()
for(i in 1:length(v_list)){
ind_lls[[i]] <- loglik_nbinom(size=disp, v=v_list[[i]], m=mu_list[[i]])
}
# log(product(likelihoods)) == sum(log(likelihoods))
(-1)*sum(unlist(ind_lls))
}
# Assign lists to vars in environment:
v_list=list_v
mu_list=list_mu
# Compute optimization without specifying any fixed parameters:
fit <- bbmle::mle2(minuslogl=neg_jloglik_nbinom,
start=c(disp=3))
The error occurs because bbmle
is internally converting the parameters to a vector ... (the internals of bbmle
are too complicated and desperately need to be refactored at some point ...)
How about converting the data to long format with a group-ID factor and using the data
argument to pass the information?
## convert to long format
dd <- data.frame(v = unlist(list_v),
f = factor(rep(1:length(list_v), lengths(list_v))))
dd$mu <- unlist(list_mu[dd$f])
## fit
mle2(v ~ dnbinom(mu = mu, size = exp(logsize)),
data = dd,
start = list(logsize = 0))
If you want to fit the means instead of fixing them to a known value, you can use the parameters=
argument to fit different values of the mean for each group ... ?
In glmmTMB
you can use the map=
and start=
arguments to specify fixed parameter values ...
## convert to long format
dd <- data.frame(v = unlist(list_v),
f = factor(rep(1:length(list_v), lengths(list_v))))
## fit
mle2(v ~ dnbinom(mu = exp(logmu), size = exp(logsize)),
parameters = list(logmu ~ f - 1),
data = dd,
start = list(logmu = 0, logsize = 0))