[SOLVED] Confusing output of tensorflow_probability.bijectors.ScaleMatvecDiag. Left or right multiplication?

Confusing output of tensorflow_probability.bijectors.ScaleMatvecDiag. Left or right multiplication?

The official page says, Compute Y = g(X; scale) = scale @ X. So I understand scale is left-multiplied to X, but I see that ScaleMatvecDiag calculates X @ scale.

The following code produces

import numpy as np
import tensorflow_probability as tfp
tfb = tfp.bijectors

x = [[1., 2.], [3., 4.]]
b = tfb.ScaleMatvecDiag(scale_diag=[-1., 2.])
b.forward(x)

[[-1.,  4.],
 [-3.,  8.]]

I am expecting

np.diag([-1., 2.]) @ x

[[-1., -2.],
 [ 6.,  8.]]

From the following outputs, I see that ScaleMatvecDiag calculates X @ scale.

y = [[1., 2, 3], [4, 5, 6]]
z = [[1., 2], [3, 4], [5, 6]]

b.forward(y) --> ValueError: Dimensions 2 and 3 are not compatible
b.forward(z) --> (3, 2)

I would be appreciated if anyone clarify the misunderstanding.

Solution

I think there's a documentation bug.

In short, matvec != matmul (and note that @ is matmul, not matvec)

Ignoring "batching":

matmul takes inputs of shape [k, m], [m, n] and outputs [k, n]
matvec takes inputs of shape [k, m], [m] and outputs [k].

Taking batching into account:

matmul takes inputs of shape [batch, k, m], [batch, m, n] and outputs [batch, k, n]
matvec takes inputs of shape [batch, k, m], [batch, m] and outputs [batch k].

The right-hand sides of your examples are being interpreted as batches of vectors:

shape [2, 2] => batch of two 2d vectors
shape [3, 2] => batch of three 2d vectors
shape [2, 3] => batch of two 3d vectors

only the batches of 2-vectors will be admissible to a matvec with a 2x2 left-hand side (matrix).