[SOLVED] How to set a fixed random state in RandomizedSearchCV?

How to set a fixed random state in RandomizedSearchCV?

I'm using RandomizedSearchCV with RandomForestClassifier in scikit-learn. I want to make sure my results are reproducible across runs. Where should I set the random_state—in the classifier, in RandomizedSearchCV, or both?

Example code:

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV

clf = RandomForestClassifier()
search = RandomizedSearchCV(clf, param_distributions=params, n_iter=10)

What's the best practice to ensure consistent results?

Solution

You can perform a simple test using as a starter code given in the RandomizedSearchCV examples. In the code, a random_state is set both, in the classifier, as well as in the RandomizedSearchCV. Writing a loop with let's say 50 iterations and printing outcomes, that is .best_params_ will show the following:

setting random_state both in RandomizedSearchCV and classifier/regressor will always give the same outcome
setting random_state in just one of those will provide different outcomes across iterations.

So the conclusion is, that if you need reproducibility, you need to set this parameter in both places as in both places separate random generators are used.

Also it is worth to check some more information on the used numbers from this post, as well as official glossary concerning random_state.

The code:

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform

iris = load_iris()
for i in range(50):
    logistic = LogisticRegression(solver='saga', tol=1e-2, max_iter=200 ,random_state=0) 

    distributions = dict(C=uniform(loc=0, scale=4),
                     penalty=['l2', 'l1'])

    clf = RandomizedSearchCV(logistic, distributions, random_state=0)

    search = clf.fit(iris.data, iris.target)
    print(search.best_params_)