rmathematical-optimizationmleprobability-distribution

How to maximize joint likelihood function with different (but fixed) means


The goal

Optimize a common dispersion parameter across groups of data over fixed mu, where mu changes by group.

The problem

I have n=10 groups of data, each of which I assume is a random sample from a negative binomial distribution with fixed $mu=m_i$.

# Simulate data in hand: 

# N=10 vectors of variable length
# fixed data:
N <- 10
ns <- sample(1:100, N)
# fixed means:
mus <- sample(1:100, N)
list_v <- lapply(1:N, function(i){ rnbinom(ns[i], size=3, mu=mus[i]) })
list_mu <- as.list(mus)

I'd like to estimate a dispersion (size) parameter that is common to all groups. So I'd like to optimize the joint maximum likelihood over the size parameter. I wrote a function negjloglik_nbinom that can handle the varying mu parameters:

# Define functions:

loglik_nbinom <- function(v, size, m){
  log(dnbinom(v, mu=m, size=size))
}

neg_jloglik_nbinom <- function(disp, v_list, mu_list){
  # Likelihood for mean m
  ind_lls <- list()
  for(i in 1:length(v_list)){
    ind_lls[[i]] <- loglik_nbinom(size=disp, v=v_list[[i]], m=mu_list[[i]])
  }
  # log(product(likelihoods)) == sum(log(likelihoods))
  (-1)*sum(unlist(ind_lls))
}

which I then attempt to pass to bbmle::mle2:

  fit <- bbmle::mle2(minuslogl=neg_jloglik_nbinom,
                     start=c(disp=3),
                     fixed=list(v_list=list_v,
                                mu_list=list_mu)
                     )

And this throws a strange error:

Error in bbmle::mle2(minuslogl = neg_jloglik_nbinom, start = c(disp = 3),  : 
  some named arguments in 'fixed' are not arguments to the specified log-likelihood function:v_list1, v_list2, v_list3, v_list4, v_list5, v_list6, v_list7, v_list8, v_list9, v_list10, v_list11, v_list12, v_list13, v_list14, v_list15, v_list16, v_list17, v_list18, v_list19, v_list20, v_list21, v_list22, v_list23, v_list24, v_list25, v_list26, v_list27, v_list28, v_list29, v_list30, v_list31, v_list32, v_list33, v_list34, v_list35, v_list36, v_list37, v_list38, v_list39, v_list40, v_list41, v_list42, v_list43, v_list44, v_list45, v_list46, v_list47, v_list48, v_list49, v_list50, v_list51, v_list52, v_list53, v_list54, v_list55, v_list56, v_list57, v_list58, v_list59, v_list60, v_list61, v_list62, v_list63, v_list64, v_list65, v_list66, v_list67, v_list68, v_list69, v_list70, v_list71, v_list72, v_list73, v_list74, v_list75, v_list76, v_list77, v_list78, v_list79, v_list80, v_list81, v_list82, v_list83, v_list84, v_list85, v_list86, v_list87, v_list88, v_list89, v_list90, v_list91,

What works:

The optimization doesn't end up being a problem if v_list and mu_list are not passed as function arguments, and instead neg_jloglik_nbinom finds them in the environment. This doesn't seem ideal but I'll live with it if I have to!

# Rewrite objective function without list args:
neg_jloglik_nbinom <- function(disp){
  # Likelihood for mean m
  ind_lls <- list()
  for(i in 1:length(v_list)){
    ind_lls[[i]] <- loglik_nbinom(size=disp, v=v_list[[i]], m=mu_list[[i]])
  }
  # log(product(likelihoods)) == sum(log(likelihoods))
  (-1)*sum(unlist(ind_lls))
}

# Assign lists to vars in environment:
v_list=list_v
mu_list=list_mu

# Compute optimization without specifying any fixed parameters:
fit <- bbmle::mle2(minuslogl=neg_jloglik_nbinom,
                   start=c(disp=3))

Solution

  • The error occurs because bbmle is internally converting the parameters to a vector ... (the internals of bbmle are too complicated and desperately need to be refactored at some point ...)

    How about converting the data to long format with a group-ID factor and using the data argument to pass the information?

    ## convert to long format
    dd <- data.frame(v = unlist(list_v), 
                     f = factor(rep(1:length(list_v), lengths(list_v))))
    dd$mu <- unlist(list_mu[dd$f])
    ## fit
    mle2(v ~ dnbinom(mu = mu, size = exp(logsize)), 
             data = dd,
             start = list(logsize = 0))
    

    If you want to fit the means instead of fixing them to a known value, you can use the parameters= argument to fit different values of the mean for each group ... ?

    In glmmTMB you can use the map= and start= arguments to specify fixed parameter values ...

    ## convert to long format
    dd <- data.frame(v = unlist(list_v), 
                     f = factor(rep(1:length(list_v), lengths(list_v))))
    ## fit
    mle2(v ~ dnbinom(mu = exp(logmu), size = exp(logsize)),
         parameters = list(logmu ~ f - 1),
         data = dd,
         start = list(logmu = 0, logsize = 0))