I want to implement a binary classification model using Gaussian process. According to the official documentation, I had the code as below.
The X has 2048 features and Y is either 0 or 1. After optimizing the model, I was trying to evaluate the performance.
However, the predict_y
method yields a weird result; the expected pred
should have a shape like (n_test_samples, 2), which represents the probability of being class 0 and 1. But the result I got instead is (n_test_samples, n_training_samples).
What is going wrong?
def model(X,Y):
'''
X: (n_training_samples, n_features) , my example is (n, 2048)
Y: (n_training_samples,) , binary classification
'''
m = gpflow.models.VGP(
(X, Y), likelihood=gpflow.likelihoods.Bernoulli(), kernel=gpflow.kernels.SquaredExponential()
)
opt = gpflow.optimizers.Scipy()
opt.minimize(m.training_loss, variables=m.trainable_variables)
return m
def evaluate(model,X,Y,accuracy, MCC, Kappa):
'''
X: (n_test_samples, n_features) , my example is (n, 2048)
Y: (n_test_samples,) , binary classification
'''
pred,_ = model.predict_y(X)
print('pred.shape is {}'.format(pred)) # I got wired result (num of test samples <X.shape[0]>, num of training samples)
accuracy += [accuracy_score(Y, pred)]
MCC += [matthews_corrcoef(Y, pred)]
Kappa += [cohen_kappa_score(Y, pred)]
return accuracy, MCC, Kappa
I finally figured it out. The reason is the Y for VGP model should have a shape like (n_training_samples, 1) instead of (n_training_samples,).