I wrote a function to calculate the log-likelihood of a set of observations sampled from a mixture of two normal distributions. This function is not giving me the correct answer.
I will not know which of the two distributions any given sample is from, so the function needs to sum over possibilities.
This function takes a vector of five model parameters as its first argument (μ1, σ1, μ2, σ2 and p) where μi and σi are the mean and standard deviation of the ith distribution and p is the probability a sample is from the first distribution. For the second argument, the function takes a vector of observations.
I have written the following function:
mixloglik <- function(p, v) {
sum(log(dnorm(v, p[1], p[2])*p[5] + dnorm(v,p[3],p[4]))*p[5])
}
I can create test data, for which I know the solution should be ~ -854.6359:
set.seed(42)
v<- c(rnorm(100), rnorm(200, 8, 2))
p <- c(0, 1, 6, 2, 0.5)
When I test this function on the test data I do not get the correct solution
> mixloglik(p, v)
[1] -356.7194
I know the solution should be ~ -854.6359. Where am I going wrong in my function?
The correct expression for the log-likelihood is the following.
mixloglik <- function(p, v) {
sum(log(p[5]*dnorm(v, p[1], p[2]) + (1 - p[5])*dnorm(v, p[3], p[4])))
}
Now try it:
set.seed(42)
v<- c(rnorm(100), rnorm(200, 8, 2))
p <- c(0, 1, 6, 2, 0.5)
mixloglik(p, v)
#[1] -854.6359
In cases like this, the best way to solve the error is to restart by rewriting the expression on paper and recode it.