I am building a Torch-based Gaussian Process model which allows me to use custom kernels and take advantage of auto derivatives. However, I find that even in the simplest case, the Numpy-based implementation gave a very result than that from Torch-based one.
Allow me to attached the code
import torch
import matplotlib.pyplot as plt
import numpy as np
length_scale, sigma_s, sigma_n = (0.8, 1.0, 0.1)
def secret_function(x, noise=0.0):
return torch.sin(x) + noise * torch.randn(x.shape)
X = 10 * torch.rand(40, 1) - 4
X = X.reshape(-1,1)
Y = secret_function(X, noise=1e-1)
x = torch.linspace(-8, 8, 100).reshape(-1, 1)
y = secret_function(x)
x_all = torch.cat([X, x],0)
# The following cdist compute 2-norm distance, not squared one
K = ((-0.5)*torch.cdist(x_all, x_all, p =2)/length_scale**2).exp()
K = K * sigma_s**2
L1 = X.shape[0]
K_X = K[:L1,:L1]
K_x = K[L1:,L1:]
K_xX = K[L1:,:L1]
K_Xx = K[:L1,L1:]
K_inv = torch.linalg.inv(K_X + sigma_n**2*torch.eye(L1))
tmp = torch.matmul(K_xX, K_inv)
mu = torch.matmul(tmp, Y)
covar = K_x - torch.matmul(tmp, K_Xx)
var = torch.diagonal(covar, 0)
std = torch.sqrt(var).reshape(-1,1)
plt.figure(figsize=(12, 6))
plt.plot(x.numpy(), mu.numpy(),'b-.')
plt.plot(X.numpy(), Y.numpy(), 'r.')
plt.fill_between(x.numpy().flat, (mu - 2 * std).numpy().flat, (mu + 2 * std).numpy().flat, color="#dddddd")
And here is what I get for the regression result
The exp kernel is used, which causes the predictive mean not smooth. Below I try with same regression but using Numpy library.
X = X.numpy()
Y = Y.numpy()
x = x.numpy()
y = y.numpy()
x_all = np.vstack([X,x])
dist2 = (x_all**2).sum(1)[:,None] + (x_all**2).sum(1) - 2*x_all.dot(x_all.T)
K_np = sigma_s**2 * np.exp((-0.5)*dist2/length_scale**2)
K_Xnp = K_np[:L1,:L1]
K_xnp = K_np[L1:,L1:]
K_xXnp = K_np[L1:,:L1]
K_Xxnp = K_np[:L1,L1:]
K_invnp = np.linalg.inv(K_Xnp + sigma_n**2*np.eye(L1))
tmp_np = np.matmul(K_xXnp, K_invnp)
mu_np = np.matmul(tmp_np, Y)
covar_np = K_xnp - np.matmul(tmp_np, K_Xxnp)
var_np = np.diagonal(covar_np, 0)
std_np = np.sqrt(var_np).reshape(-1,1)
plt.figure(figsize=(12, 6))
plt.plot(x, mu_np,'b-.')
#plt.plot(x, y,'k-')
plt.plot(X, Y, 'r.')
plt.fill_between(x.flat, (mu_np - 2 * std_np).flat, (mu_np + 2 * std_np).flat, color="#dddddd")
Now I got a much smoother prediction from the numpy-based regression.
What can I do in the torch version to get the same smooth result as np's one? (lesson learned. cdist computes p-norm distance, not squared distance)
It looks like one time you are computing pairwise distances
torch.cdist(x_all, x_all, p =2)
while the other time you're using squared distances
dist2 = (x_all**2).sum(1)[:,None] + (x_all**2).sum(1) - 2*x_all.dot(x_all.T)
But note that the latter is about the most expensive and unsable way of computing it, I'd recommend using scipy.spatial.distance.cdist
or at least doing something like
dist2 = ((x_all[None, :, :] - x_all[:, None, :])**2).sum(axis=-1)