I'm trying to use the pgmpy Python package to learn the transition probabilities between a certain set of states, however when I fit the model, I find that the conditional probabilities are incorrect.
As a very simplified example of the sort of issue I'm talking about, consider the Bayesian network consisting of two states, A and B, with a single directed edge running from A to B. And suppose that we have observed that whenever A is zero, B is one, and whenever A is one, B is zero. The code describing this situation is given by:
import pandas as pd
from pgmpy.models import BayesianModel
data = pd.DataFrame(data={'A': [0, 0, 1, 1, 1, 1], 'B': [1, 1, 0, 0, 0, 0]})
model = BayesianModel([('A', 'B')])
model.fit(data)
However when we then inspect the fitted conditional probabilities by calling model.cpds[1]
, we find that pgmpy has learned the following:
+------+------+------+
| A | A(0) | A(1) |
+------+------+------+
| B(0) | 0.5 | 0.5 |
+------+------+------+
| B(1) | 0.5 | 0.5 |
+------+------+------+
when it should have learned
+------+------+------+
| A | A(0) | A(1) |
+------+------+------+
| B(0) | 0.0 | 1.0 |
+------+------+------+
| B(1) | 1.0 | 0.0 |
+------+------+------+
Can someone please explain to me what is going on here? This is an extremely basic example, and I feel like I'm going crazy. Thanks
The version of pgmpy available for installation through pip
has a bug that causes it to compute conditional probabilities incorrectly. Cloning the dev repository from git and installing it manually fixes the issue. Thanks to @lstbl for figuring this out here: https://stats.stackexchange.com/questions/292738/inconsistencies-between-conditional-probability-calculations-by-hand-and-with-pg