I want to understand how the Bayesian GPLVM implementation works in GPflow, but I am struggling with a few lines of the code. I would greatly appreciate any help me with the following questions:
B = AAT + tf.eye(num_inducing, dtype=default_float())
corresponds to $\beta\Psi_2 + K_{MM}$ in Eq. 14 of Titsias and Lawrence 2010. However, I don't understand how the code implements this expression.
A = tf.linalg.triangular_solve(L, tf.transpose(psi1), lower=True) / sigma
tmp = tf.linalg.triangular_solve(L, psi2, lower=True)
AAT = tf.linalg.triangular_solve(L, tf.transpose(tmp), lower=True) / sigma2
B = AAT + tf.eye(num_inducing, dtype=default_float())
LB = tf.linalg.cholesky(B)
log_det_B = 2.0 * tf.reduce_sum(tf.math.log(tf.linalg.diag_part(LB)))
c = tf.linalg.triangular_solve(LB, tf.linalg.matmul(A, Y_data), lower=True) / sigma
I am guessing the code is using the matrix inversion lemma, but I cannot see how.
In Eq. 14 from Titsias and Lawrence 2010, there are three terms that I cannot understand how they are calculated in gplvm.py:
I would greatly appreciate any hint.
Cordially, Joaquin
The code to compute the elbo gplvm.py is very elegant and efficient. In case anyone wants to understand it, I respond to my previous questions below and I posted further notes.
- I understand that matrix B in line gplvm.py:182:
B = AAT + tf.eye(num_inducing, dtype=default_float())
corresponds to $\beta\Psi_2 + K_{MM}$ in Eq. 14 of Titsias and Lawrence 2010. However, I don't understand how the the gplvm code implements the expression in the paper.
Call matrix $\beta\Psi_2 + K_{MM}$ in Eq. 14 of Titsias and Lawrence 2010 (i.e., TL10) as D. In gplvm.py this matrix is calculated as D=LBL where B is the matrix given above (i.e., B=AAT+I) and L is the Choleskly factor of K_{MM}.
- Related to the previous question, I cannot understand what A, tmp, AAT and c mean in the code?
A = tf.linalg.triangular_solve(L, tf.transpose(psi1), lower=True) / sigma tmp = tf.linalg.triangular_solve(L, psi2, lower=True) AAT = tf.linalg.triangular_solve(L, tf.transpose(tmp), lower=True) / sigma2 B = AAT + tf.eye(num_inducing, dtype=default_float()) LB = tf.linalg.cholesky(B) log_det_B = 2.0 * tf.reduce_sum(tf.math.log(tf.linalg.diag_part(LB))) c = tf.linalg.triangular_solve(LB, tf.linalg.matmul(A, Y_data), lower=True) / sigma
I am guessing the code is using the matrix inversion lemma, but I cannot see how.
The code does not use the matrix inversion lemma.
The data term in Eq. 14 of TL10, (i.e., the term in the exponential) is computed by taking the square of the norm2 of the vector c.
AAT is the matrix that appears inside the trace of the last term in Eq.~14 in TL10 (i.e., $K_{MM}^{-1)\Psi_2)$).
- In Eq. 14 from Titsias and Lawrence, 2010, there are three terms that I cannot understand how they are calculated: >
0.5 \beta^2 y_d^T \Psi_1 (\beta\Psi_2+K_{MM})^{-1} \Psi_1^T y_d
0.5 D \beta Tr(K_{MM}^{-1} \Psi_2)
0.5 D \log |K_{MM}|
As mentioned above, the first term is calculated by taking the square of the norm2 of vector c and the second term by taking the trace of AAT. The subtraction of the two log determinants in Eq. 14 of TL10 (and the third term) is calculated by taking log |B|.
Beautiful piece of code. Thanks.