rlog-likelihood

Calculating the log-likelihood of a set of observations sampled from a mixture of two normal distributions using R


I wrote a function to calculate the log-likelihood of a set of observations sampled from a mixture of two normal distributions. This function is not giving me the correct answer.

I will not know which of the two distributions any given sample is from, so the function needs to sum over possibilities.

This function takes a vector of five model parameters as its first argument (μ1, σ1​, μ2​, σ2​ and p) where μi​ and σi​ are the mean and standard deviation of the ith distribution and p is the probability a sample is from the first distribution. For the second argument, the function takes a vector of observations.

I have written the following function:

mixloglik <- function(p, v) {
    sum(log(dnorm(v, p[1], p[2])*p[5] + dnorm(v,p[3],p[4]))*p[5])
}

I can create test data, for which I know the solution should be ~ -854.6359:

set.seed(42)
v<- c(rnorm(100), rnorm(200, 8, 2))
p <- c(0, 1, 6, 2, 0.5)

When I test this function on the test data I do not get the correct solution

> mixloglik(p, v)
[1] -356.7194

I know the solution should be ~ -854.6359. Where am I going wrong in my function?


Solution

  • The correct expression for the log-likelihood is the following.

    mixloglik <- function(p, v) {
      sum(log(p[5]*dnorm(v, p[1], p[2]) + (1 - p[5])*dnorm(v, p[3], p[4])))
    }
    

    Now try it:

    set.seed(42)
    v<- c(rnorm(100), rnorm(200, 8, 2))
    p <- c(0, 1, 6, 2, 0.5)
    mixloglik(p, v)
    #[1] -854.6359
    

    In cases like this, the best way to solve the error is to restart by rewriting the expression on paper and recode it.