I am trying to graph a Bayesian Network (BN) with instantiated nodes using the libraries bnlearn
and Rgraphviz
. My workflow is as follow:
After creating a data frame with random data (the data I am actually using is obviously not random) I then discretise the data, structure learn the directed acyclic graph (DAG), fit the data to the DAG and then plot the DAG. I also plot a DAG which shows the posterior probabilities of each of the nodes.
#rm(list = ls())
library(bnlearn)
library(Rgraphviz)
# Generating random dataframe
data_clean <- data.frame(a = runif(min = 0, max = 100, n = 1000),
b = runif(min = 0, max = 100, n = 1000),
c = runif(min = 0, max = 100, n = 1000),
d = runif(min = 0, max = 100, n = 1000),
e = runif(min = 0, max = 100, n = 1000))
# Discretising the data into 3 bins
bins <- 3
data_discrete <- discretize(data_clean, breaks = bins)
# Creating factors for each bin in the data
lv <- c("low", "med", "high")
for (i in names(data_discrete)){
levels(data_discrete[, i]) = lv
}
# Structure learning the DAG from the training set
whitelist <- matrix(c("a", "b",
"b", "c",
"c", "e",
"a", "d",
"d", "e"),
ncol = 2, byrow = TRUE, dimnames = list(NULL, c("from", "to")))
bn.hc <- hc(data_discrete, whitelist = whitelist)
# Plotting the DAG
dag.hc <- graphviz.plot(bn.hc,
layout = "dot")
# Fitting the data to the structure
fitted <- bn.fit(bn.hc, data = data_discrete, method = "bayes")
# Plotting the DAG with posteriors
graphviz.chart(fitted, type = "barprob", layout = "dot")
The next thing I do is to manually change the distributions in the bn.fit
object, assigned to fitted
, and then plot a DAG that shows the instantiated nodes and the updated posterior probability of the response variable e
.
# Manually instantiating
fitted_evidence <- fitted
cpt.a = matrix(c(1, 0, 0), ncol = 3, dimnames = list(NULL, lv))
cpt.c = c(1, 0, 0,
0, 1, 0,
0, 0, 1)
dim(cpt.c) <- c(3, 3)
dimnames(cpt.c) <- list("c" = lv, "b" = lv)
cpt.b = c(1, 0, 0,
0, 1, 0,
0, 0, 1)
dim(cpt.b) <- c(3, 3)
dimnames(cpt.b) <- list("b" = lv, "a" = lv)
cpt.d = c(0, 0, 1,
0, 1, 0,
1, 0, 0)
dim(cpt.d) <- c(3, 3)
dimnames(cpt.d) <- list("d" = lv, "a" = lv)
fitted_evidence$a <- cpt.a
fitted_evidence$b <- cpt.b
fitted_evidence$c <- cpt.c
fitted_evidence$d <- cpt.d
# Plotting the DAG with instantiation and posterior for response
graphviz.chart(fitted_evidence, type = "barprob", layout = "dot")
This is the result I get but my actual BN is much larger with many more arcs and it would be impractical to manually change the bn.fit
object.
I would like to find out if there is a way to plot a DAG with instantiation without changing the bn.fit
object manually? Is there a workaround or function that I am missing?
I think/hope I have read the documentation for bnlearn thoroughly. I appreciate any feedback and would be happy to change anything in the question if I have not conveyed my thoughts clearly enough.
Thank you.
How about using cpdist
to draw samples from the posterior given the evidence. You can then estimate the updated parameters using bn.fit
using the cpdist
samples. Then plot as before.
An example:
set.seed(69184390) # for sampling
# Your evidence vector
ev <- list(a = "low", b="low", c="low", d="high")
# draw samples
updated_dat <- cpdist(fitted, nodes=bnlearn::nodes(fitted), evidence=ev, method="lw", n=1e6)
# refit : you'll get warnings over missing levels
updated_fit <- bn.fit(bn.hc, data = updated_dat)
# plot
par(mar=rep(0,4))
graphviz.chart(updated_fit, type = "barprob", layout = "dot")
Note I used bnlearn::nodes
as nodes
is masked by a dependency of Rgraphviz
. I tend to load bnlearn
last.