scikit-learnk-fold

Does K-Fold iteratively train a model


If you run cross-val_score() or cross_validate() on a dataset, is the estimator trained using all the folds at the end of the run?

I read somewhere that cross-val_score takes a copy of the estimator. Whereas I thought this was how you train a model using k-fold.

Or, at the end of the cross_validate() or cross_val_score() you have a single estimator and then use that for predict()

Is my thinking correct?


Solution

  • You can refer to sklearn-document here.

    If you do 3-Fold cross validation,

    So, after using cross-validate, you will get three models. If you want the model objects of each round, you can add parameter return_estimato=True. The result which is the dictionary will have another key named estimator containing the list of estimator of each training.

    from sklearn import datasets, linear_model
    from sklearn.model_selection import cross_validate
    from sklearn.metrics import make_scorer
    from sklearn.metrics import confusion_matrix
    from sklearn.svm import LinearSVC
    diabetes = datasets.load_diabetes()
    X = diabetes.data[:150]
    y = diabetes.target[:150]
    lasso = linear_model.Lasso()
    cv_results = cross_validate(lasso, X, y, cv=3, return_estimator=True)
    print(sorted(cv_results.keys()))
    #Output: ['estimator', 'fit_time', 'score_time', 'test_score']
    cv_results['estimator']
    #Output: [Lasso(), Lasso(), Lasso()]
    

    However, in practice, the cross validation method is used only for testing the model. After you found the good model and parameter setting that give you the high cross-validation score. It will be better if you fit the model with the whole training set again and test the model with the testing set.