I am trying to learn sklearn and for this, I am trying a simple exercise with a linear SVM. The SVC tries to predict the number of bedrooms in a house, based on the value of the house and its area.
I have managed to get something that looks ok, but the template I took from matplotlib's documentation uses a color map and I don't know exactly what corresponds to what.
How could I add a legend that specifies what the color of each scattered point corresponds to, and what the SVM's sections correspond to as well?
Also, in order to make the same work, I had to preprocess.scale
my features, and the ticks now have the preprocessed value ;( How could I unscale somehow or retrieve the original values to use for the graduation.
Here is the plot:
https://i.sstatic.net/bigiR.png (I don't have enough reputation to post directly)
And here is my code:
style.use('ggplot')
dataset = pd.read_csv('/Path/Paros.csv')
dataset = dataset[dataset['size']<3000]
X = np.array(dataset[['size', 'value']])
y = np.array(dataset[['bedrooms']])
X = preprocessing.scale(X)
h = 0.01 # step size in the mesh
C = 0.01 # SVM regularization parameter
clf = svm.SVC(kernel='linear', C=C).fit(X, y[:,0])
# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
print "mesh"
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Size')
plt.ylabel('Price')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.show()
plt.colorbar()
did what I was looking for.