How bnlearn calculates BIC of continuous data?

I'm using bnlearn package in R, and I would like to know how the package calculates BIC-g (the BIC in Gaussian distribution).

Let's make a structure, and I can find BIC score as follows

library(bnlearn)
X = iris[, 1:3]
names(X) = c("A", "B", "C")
Network = empty.graph(names(X))
bnlearn::score(Network, X, type="bic-g")

bnlearn provides me with more detailed information about how this score could be calculated,

bnlearn::score(Network, X, type="bic-g", debug=TRUE)

And this results in

----------------------------------------------------------------
* processing node A.
  > loglikelihood is -184.041441.
  > penalty is 2.505318 x 2 = 5.010635.
----------------------------------------------------------------
* processing node B.
  > loglikelihood is -87.777815.
  > penalty is 2.505318 x 2 = 5.010635.
----------------------------------------------------------------
* processing node C.
  > loglikelihood is -297.588727.
  > penalty is 2.505318 x 2 = 5.010635.
[1] -584.4399

I know how to calculate BIC for discrete data in Bayesian networks, referring to here. But I do not know how it could generalize to joint Gaussian (multivariate normal) case.

Definitely it might be related to approximating likelihood and penalty term, and it seems the package processes calculate likelihoods and penalties for each node and then sum them.

bnlearn::score(Network, X, type="loglik-g", debug=TRUE)

But I would like to know how I can specifically calculate likelihoods and penalties, given data.

I found out the material that explains the Laplace Approximation (refer to page 57), but I could not relate it.

Anyone to help me out?

Solution

The BIC is calculated as

BIC = -2* logLik + nparams* log(nobs)

but in bnlearn this is rescaled by -2 (see?score) to give

BIC = logLik -0.5* nparams* log(nobs)

So for your example, with no edges the likelihood is calculated using the marginal means, and errors (or more generally, for each node the number of parameters is given by summing 1 (intercept) + 1 (residual error) + the number of parents), e.g.

library(bnlearn)
X = iris[, 1:3]
names(X) = c("A", "B", "C")
Network = empty.graph(names(X))

(ll = sum(sapply(X, function(i) dnorm(i, mean(i), sd(i), log=TRUE)))) 
#[1] -569.408
(penalty = 0.5* log(nrow(X))* 6)
#[1] 15.03191

ll - penalty
#[1] -584.4399

If there was an edge, you calclate the log-likelihood using the fitted values and residual errors. For the net:

Network = set.arc(Network, "A", "B")

We need the log-likelihood components from node A, and C

(llA = with(X, sum(dnorm(A, mean(A), sd(A), log=TRUE))))
#[1] -184.0414
(llC = with(X, sum(dnorm(C, mean(C), sd(C), log=TRUE))))
#[1] -297.5887

and we get B's conditional probabilities from a linear regression

m = lm(B ~ A, X)
(llB = with(X, sum(dnorm(B, fitted(m), stats::sigma(m), log=TRUE))))
#[1] -86.73894

Giving

(ll = llA + llB + llC)
#[1] -568.3691
(penalty = 0.5* log(nrow(X))* 7)
#[1] 17.53722
ll - penalty
#[1] -585.9063 

#  bnlearn::score(Network, X, type="bic-g", debug=TRUE)
# ----------------------------------------------------------------
# * processing node A.
#    loglikelihood is -184.041441.
#    penalty is 2.505318 x 2 = 5.010635.
# ----------------------------------------------------------------
# * processing node B.
#    loglikelihood is -86.738936.
#    penalty is 2.505318 x 3 = 7.515953.
# ----------------------------------------------------------------
# * processing node C.
#    loglikelihood is -297.588727.
#    penalty is 2.505318 x 2 = 5.010635.
# [1] -585.9063