For hyperparameter tuning, I use the function GridSearchCV
from the Python package sklearn
. Some of the models that I test require feature scaling (e.g. Support Vector Regression - SVR). Recently, in the Udemy course Machine Learning A-Z™: Hands-On Python & R In Data Science, the instructors mentioned that for SVR, the target should also be scaled (if it is not binary). Bearing this in mind, I wonder whether the target is also scaled in each iteration of the cross-validation procedure performed by GridSearchCV
or if only the features are scaled. Please see the code below, which illustrates the normal procedure that I use for hyperparameter tuning for estimators that require scaling for the training sets:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVR
def SVRegressor(**kwargs):
'''contruct a pipeline to perform SVR regression'''
return make_pipeline(StandardScaler(), SVR(**kwargs))
params = {'svr__kernel': ["poly", "rbf"]}
grid_search = GridSearchCV(SVRegressor(), params)
grid_search.fit(X, y)
I know that I could simply scale X
and y
a priori and drop the StandardScaler from the pipeline. However, I want to implement this approach in a code pipeline where multiple models are tested, in which, some require scaling and others do not. That is why I want to know how GridSearchCV
handles scaling under the hood.
No it doesn't scale the target, if you look at make_pipeline, it simply passes the X and y argument into your transformer, and StandardScaler()
does nothing to your y
:
def _fit_transform_one(transformer,
X,
y,
weight,
message_clsname='',
message=None,
**fit_params):
"""
Fits ``transformer`` to ``X`` and ``y``. The transformed result is returned
with the fitted transformer. If ``weight`` is not ``None``, the result will
be multiplied by ``weight``.
"""
with _print_elapsed_time(message_clsname, message):
if hasattr(transformer, 'fit_transform'):
res = transformer.fit_transform(X, y, **fit_params)
else:
res = transformer.fit(X, y, **fit_params).transform(X)
if weight is None:
return res, transformer
return res * weight, transformer
You can try this on StandardScaler() and you can see it does not do anything with y:
np.random.seed(111)
X = np.random.normal(5,2,(100,3))
y = np.random.normal(5,2,100)
res = StandardScaler().fit_transform(X=X,y=y)
res.shape
(100, 3)
res.mean(axis=0)
array([1.01030295e-15, 4.39648318e-16, 8.91509089e-16])
res.std(axis=0)
array([1., 1., 1.])
You can also check the result of your gridsearchcv:
SVRegressor = make_pipeline(StandardScaler(), SVR())
params = {'svr__kernel': ["poly", "rbf"]}
grid_search = GridSearchCV(SVRegressor, params,
scoring='neg_mean_absolute_error')
On unscaled y, you will see that on the unscaled data, your negative mean absolute error is around the same scale as your standard deviation (I used 2 in my example):
grid_search.fit(X, y)
grid_search.cv_results_['mean_test_score']
array([-2.01029707, -1.88779205])
On scaled y, our standard deviation would be 1, and you can see the error is around -1,:
y_scaled = StandardScaler().fit_transform(y.reshape(-1,1)).ravel()
grid_search.fit(X, y_scaled)
grid_search.cv_results_['mean_test_score']
array([-1.00585999, -0.88330208])