I am trying to carry out a k-fold cross-validation grid search using the KNN algorithm using python sklearn, with parameters in the search being number of neighbors K and distance metric. I am including mahalanobis and seuclidean as distance metrics, and understand these have a parameter which needs to be specified, namely V or VI (covariance matrix of features or inverse of this).
Below is my code:
X_train, X_test, y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=10,stratify=y)
knn=KNeighborsClassifier()
grid_param={'n_neighbors':np.arange(1,51),'metric':['euclidean','minkowski','mahalanobis','seuclidean'],'metric_params':[{'V': np.cov(X_train)}]}
knn_gscv=GridSearchCV(knn,grid_param,cv=5)
knn_gscv.fit(X_train,y_train) (*)
The (*) line throws this error when executed:
TypeError: __init__() got an unexpected keyword argument 'V'
I have also tried VI instead of V but getting same error.
I have come across potential solutions below but these don't help.
https://github.com/scikit-learn/scikit-learn/issues/6915
Scikit-learn: How do we define a distance metric's parameter for grid search
Any help appreciated!
This is also my first question, so any feedback would be helpful also with this regard.
grid_params = [
{'n_neighbors': np.arange(1, 51), 'metric': ['euclidean', 'minkowski']},
{'n_neighbors': np.arange(1, 51), 'metric': ['mahalanobis', 'seuclidean'],
'metric_params': [{'V': np.cov(X_train)}]}
]
The issue is that euclidean
and minkowski
metrics do not accepts V
parameter. So you need to separate them.