python pytorch linear-algebra numerical-methods gaussian-process

Using matrix inversion in Torch versus Numpy in Gaussian Process Regression (previous question solved)

I am building a Torch-based Gaussian Process model which allows me to use custom kernels and take advantage of auto derivatives. However, I find that even in the simplest case, the Numpy-based implementation gave a very result than that from Torch-based one.

Allow me to attached the code

import torch
import matplotlib.pyplot as plt
import numpy as np

length_scale, sigma_s, sigma_n = (0.8, 1.0, 0.1)

def secret_function(x, noise=0.0):
    return torch.sin(x) + noise * torch.randn(x.shape)

X = 10 * torch.rand(40, 1) - 4
X = X.reshape(-1,1)
Y = secret_function(X, noise=1e-1)

x = torch.linspace(-8, 8, 100).reshape(-1, 1)
y = secret_function(x)

x_all = torch.cat([X, x],0)
# The following cdist compute 2-norm distance, not squared one
K = ((-0.5)*torch.cdist(x_all, x_all, p =2)/length_scale**2).exp()
K = K * sigma_s**2

L1 = X.shape[0]

K_X = K[:L1,:L1]
K_x = K[L1:,L1:] 
K_xX = K[L1:,:L1]
K_Xx = K[:L1,L1:]

K_inv = torch.linalg.inv(K_X + sigma_n**2*torch.eye(L1))
tmp = torch.matmul(K_xX, K_inv)

mu = torch.matmul(tmp, Y)
covar = K_x - torch.matmul(tmp, K_Xx)

var = torch.diagonal(covar, 0)
std = torch.sqrt(var).reshape(-1,1)

plt.figure(figsize=(12, 6))

plt.plot(x.numpy(), mu.numpy(),'b-.')
plt.plot(X.numpy(), Y.numpy(), 'r.')

plt.fill_between(x.numpy().flat, (mu - 2 * std).numpy().flat, (mu + 2 * std).numpy().flat, color="#dddddd")

And here is what I get for the regression result

GP predictive mean and plus/minus 2 std (torch-based)

The exp kernel is used, which causes the predictive mean not smooth. Below I try with same regression but using Numpy library.

X = X.numpy()
Y = Y.numpy()
x = x.numpy()
y = y.numpy()

x_all = np.vstack([X,x])
dist2 = (x_all**2).sum(1)[:,None] + (x_all**2).sum(1) - 2*x_all.dot(x_all.T)
K_np = sigma_s**2 * np.exp((-0.5)*dist2/length_scale**2)

K_Xnp = K_np[:L1,:L1]
K_xnp = K_np[L1:,L1:]
K_xXnp = K_np[L1:,:L1]
K_Xxnp = K_np[:L1,L1:]

K_invnp = np.linalg.inv(K_Xnp + sigma_n**2*np.eye(L1))
tmp_np = np.matmul(K_xXnp, K_invnp)

mu_np = np.matmul(tmp_np, Y)
covar_np = K_xnp - np.matmul(tmp_np, K_Xxnp)

var_np = np.diagonal(covar_np, 0)
std_np = np.sqrt(var_np).reshape(-1,1)

plt.figure(figsize=(12, 6))

plt.plot(x, mu_np,'b-.')
#plt.plot(x, y,'k-')
plt.plot(X, Y, 'r.')

plt.fill_between(x.flat, (mu_np - 2 * std_np).flat, (mu_np + 2 * std_np).flat, color="#dddddd")

Now I got a much smoother prediction from the numpy-based regression.

GP regression with plus-minus 2 std (numpy-based)

What can I do in the torch version to get the same smooth result as np's one? (lesson learned. cdist computes p-norm distance, not squared distance)

Solution

It looks like one time you are computing pairwise distances

torch.cdist(x_all, x_all, p =2)

while the other time you're using squared distances

dist2 = (x_all**2).sum(1)[:,None] + (x_all**2).sum(1) - 2*x_all.dot(x_all.T)

But note that the latter is about the most expensive and unsable way of computing it, I'd recommend using scipy.spatial.distance.cdist or at least doing something like

dist2 = ((x_all[None, :, :] - x_all[:, None, :])**2).sum(axis=-1)