I'm using bnlearn
package in R
, and I would like to know how the package calculates BIC-g (the BIC in Gaussian distribution).
Let's make a structure, and I can find BIC score as follows
library(bnlearn)
X = iris[, 1:3]
names(X) = c("A", "B", "C")
Network = empty.graph(names(X))
bnlearn::score(Network, X, type="bic-g")
bnlearn
provides me with more detailed information about how this score could be calculated,
bnlearn::score(Network, X, type="bic-g", debug=TRUE)
And this results in
----------------------------------------------------------------
* processing node A.
> loglikelihood is -184.041441.
> penalty is 2.505318 x 2 = 5.010635.
----------------------------------------------------------------
* processing node B.
> loglikelihood is -87.777815.
> penalty is 2.505318 x 2 = 5.010635.
----------------------------------------------------------------
* processing node C.
> loglikelihood is -297.588727.
> penalty is 2.505318 x 2 = 5.010635.
[1] -584.4399
I know how to calculate BIC for discrete data in Bayesian networks, referring to here. But I do not know how it could generalize to joint Gaussian (multivariate normal) case.
Definitely it might be related to approximating likelihood and penalty term, and it seems the package processes calculate likelihoods and penalties for each node and then sum them.
bnlearn::score(Network, X, type="loglik-g", debug=TRUE)
But I would like to know how I can specifically calculate likelihoods and penalties, given data.
I found out the material that explains the Laplace Approximation
(refer to page 57), but I could not relate it.
Anyone to help me out?
The BIC is calculated as
BIC = -2* logLik + nparams* log(nobs)
but in bnlearn
this is rescaled by -2 (see?score
) to give
BIC = logLik -0.5* nparams* log(nobs)
So for your example, with no edges the likelihood is calculated using the marginal means, and errors (or more generally, for each node the number of parameters is given by summing 1 (intercept) + 1 (residual error) + the number of parents), e.g.
library(bnlearn)
X = iris[, 1:3]
names(X) = c("A", "B", "C")
Network = empty.graph(names(X))
(ll = sum(sapply(X, function(i) dnorm(i, mean(i), sd(i), log=TRUE))))
#[1] -569.408
(penalty = 0.5* log(nrow(X))* 6)
#[1] 15.03191
ll - penalty
#[1] -584.4399
If there was an edge, you calclate the log-likelihood using the fitted values and residual errors. For the net:
Network = set.arc(Network, "A", "B")
We need the log-likelihood components from node A, and C
(llA = with(X, sum(dnorm(A, mean(A), sd(A), log=TRUE))))
#[1] -184.0414
(llC = with(X, sum(dnorm(C, mean(C), sd(C), log=TRUE))))
#[1] -297.5887
and we get B's conditional probabilities from a linear regression
m = lm(B ~ A, X)
(llB = with(X, sum(dnorm(B, fitted(m), stats::sigma(m), log=TRUE))))
#[1] -86.73894
Giving
(ll = llA + llB + llC)
#[1] -568.3691
(penalty = 0.5* log(nrow(X))* 7)
#[1] 17.53722
ll - penalty
#[1] -585.9063
# bnlearn::score(Network, X, type="bic-g", debug=TRUE)
# ----------------------------------------------------------------
# * processing node A.
# loglikelihood is -184.041441.
# penalty is 2.505318 x 2 = 5.010635.
# ----------------------------------------------------------------
# * processing node B.
# loglikelihood is -86.738936.
# penalty is 2.505318 x 3 = 7.515953.
# ----------------------------------------------------------------
# * processing node C.
# loglikelihood is -297.588727.
# penalty is 2.505318 x 2 = 5.010635.
# [1] -585.9063