I am making some rather big Bayesian Networks for generating synthetic data, and I find pomegranate to be a good alternative as it generates data quickly and easily allows for inputting evidence. I have one problem with it: saving the trained models. Pomegranate's built-in methods stores as json's so big that I run out of memory when I have 30 or so variables, even when using "lighter" algorithms. The models can not be pickled due to the error
TypeError: self.distributions_ptr,self.parent_count,self.parent_idxs cannot be converted to a Python object for pickling
I am wondering if anyone has a good alternative for storing pomegranate models, or else knows of a Bayesian Network library that generates data quickly after training. I would be grateful for any tips.
if your model can be learned and stored in the memory, it can be saved in a file, but maybe not by 'pickling'. There are many different formats for Bayesian networks (bif
, xmlbif
, dsl
, uai
, etc.). I don't know pomegranate, but there is certainly a way to read/save using such a format. With pyAgrum (of which I am one of the authors), you just have to write gum.saveBN(model, "model.xxx")
to save it, and then bn=gum.loadBN("model.xxx")
to read it ... You can choose xxx
among all the supported format, for now : bif|dsl|net|bifxml|o3prm|uai
(https://pyagrum.readthedocs.io/en/1.3.1/functions.html#pyAgrum.loadBN).
As far as I understand, evidence for a sampling is just a way to filter the samples by keeping only the samples that respect the constraints (rejection sampling). There is no such a direct method in pyAgrum but this is can be done as a post-process :
import pyAgrum as gum
#create a BN with random CPTs
bn=gum.fastBN("A->B{yes|maybe|no}<-C->D->E<-F<-B")
# generate a sample of size 100
g=gum.BNDatabaseGenerator(bn)
g.setRandomVarOrder()
g.drawSamples(100)
df=g.to_pandas()
#filtering the dataframe
rslt_df = df[(df['B'] == "yes") &
(df['E'] == "1")]
And in a notebook :