If I have for example this snippet of code:
knn = KNeighborsClassifier()
grid_search_knn = GridSearchCV(
estimator=knn,
n_jobs=-1)
Do I have to set it like this:
knn = KNeighborsClassifier(random_state=42)
grid_search_knn = GridSearchCV(
estimator=knn,
n_jobs=-1
)
Or do I have to set it like this?
knn = KNeighborsClassifier(random_state=42)
grid_search_knn = GridSearchCV(
estimator=knn,
random_state=42,
n_jobs=-1
)
what is the correct why? And what if I use randomisedsearch instead of gridsearch?
In this case, setting the random_state
depends on the specific algorithm you’re using, rather than on the GridSearchCV
or RandomizedSearchCV
class.
For KNeighborsClassifier
, adding random_state
is actually unnecessary because this classifier is a deterministic algorithm, meaning it doesn’t rely on randomness to make predictions. Therefore, it won’t be affected by a random_state
parameter. As a result:
For KNeighborsClassifier
: You don’t need to set random_state
at all in either the classifier or in the GridSearchCV
/RandomizedSearchCV
.
For Randomized Algorithms: If you’re using an algorithm that involves randomness, like a decision tree or a random forest, you can set the random_state
in the estimator (like RandomForestClassifier(random_state=42)
). You don’t need to set random_state
in GridSearchCV
, as it only influences the cross-validation process, which is deterministic.
In summary:
KNeighborsClassifier
: No random_state
is needed.random_state
in the estimator, not in GridSearchCV
/RandomizedSearchCV
.RandomizedSearchCV
: You might set random_state
there if the search itself is randomized and you want reproducibility.