I am using gridsearchCV to determine model hyper-parameters:
pipe = Pipeline(steps=[(self.FE, FE_algorithm), (self.CA, Class_algorithm)])
param_grid = {**FE_grid, **CA_grid}
scorer = make_scorer(f1_score, average='macro')
search = GridSearchCV(pipe, param_grid, cv=ShuffleSplit(test_size=0.20, n_splits=5,random_state=0), n_jobs=-1,
verbose=3, scoring=scorer)
search.fit(self.data_input, self.data_output)
However, I believe I am running into some problems with overfitting: results
I would like to shuffle the data under every single parameter combination, is there any way to do this? Currently, with the k-fold cross validation the same sets of validation data are being evaluated for each parameter combination, k-fold, and so overfitting is becoming an issue.
No, there isn't. The search splits the data once and creates a task for each combination of fold and parameter combination (source).
Shuffling per parameter combination is probably not desirable anyway: the selection might then just pick the "easiest" split instead of the "best" parameter. If you think you are overfitting to the validation folds, then consider using
