rpackagedistributionmixture-modeldirichlet

How to fit a finite mixture of Dirichlet distributions


I have a compositional sample and I would like to fit a finite mixture of Dirichlet distributions. To be more precise, consider the following example:

library(gtools)
set.seed(1)
PROB = c(0.25, 0.15, 0.60)
ALPHA = list(
  c(1,1,1),
  c(2,1,1),
  c(1,1,20)
)
size = 500

N = sapply(1:3, function(i, z) sum(z == i),
           sample(1:3, size, prob = PROB, replace = TRUE))

X = do.call('rbind', 
            sapply(1:3, function(i, N) 
              rdirichlet(N[i], ALPHA[[i]]), N))[sample(1:size),]

X contains sample generated from a mixture of Dirichlet distributions defined in the 3-part simplex. The first Dirichlet component of this mixture has parameter (1,1,1), the second component has parameter (2,1,1) and the third (1,1,20). The mixture probabilities are 0.25, 0.15, 0.60. I would like to retrieve these parameters from the sample.

How would you find this parameters?


Solution

  • Reparameterizing in terms of theta1=log(p1/p3), theta2=log(p2/p3) and logs of all 9 alpha parameters, and then maximising the log likelihood using optim() with method="BFGS" seems to work if using initial values sufficiently close to the parameter values used to simulate the data. At least, all eigenvalues of the Hessian are negative, and small changes in initial values leads to the the same optimum.

    repar <- function(theta) {
      p <- exp(theta[1])
      p[2] <- exp(theta[2])
      p[3] <- 1
      p <- p/sum(p)
      alpha <- matrix(exp(theta[3:11]),3,3,byrow=TRUE)
      list(p=p,alpha=alpha)
    }
    logL <- function(theta,x) {
      par <- repar(theta)
      p <- par$p
      alpha <- par$alpha
      terms <- 0
      for (i in 1:length(p)) {
        terms <- terms + p[i]*ddirichlet(x,alpha[i,])
      }
      -sum(log(terms))
    }
    start <- c(log(c(.25,.15)/.6), log(c(1,1,1, 2,1,1, 1,1,20)))
    fit <- optim(start,logL,x=X,hessian=TRUE,method="BFGS")
    repar(fit$par)
    eigen(fit$hessian)$val
    fit2 <- optim(start+rnorm(11,sd=.2),logL,x=X,hessian=TRUE,method="BFGS")
    repar(fit2$par)