I am using gridsearchCV to determine model hyper-parameters:
pipe = Pipeline(steps=[(self.FE, FE_algorithm), (self.CA, Class_algorithm)])
param_grid = {**FE_grid, **CA_grid}
scorer = make_scorer(f1_score, average='macro')
search = GridSearchCV(pipe, param_grid, cv=ShuffleSplit(test_size=0.20, n_splits=5,random_state=0), n_jobs=-1,
verbose=3, scoring=scorer)
search.fit(self.data_input, self.data_output)
However, I believe I am running into some problems with overfitting: results
I would like to shuffle the data under every single parameter combination, is there any way to do this? Currently, with the k-fold cross validation the same sets of validation data are being evaluated for each parameter combination, k-fold, and so overfitting is becoming an issue.
No, there isn't. The search splits the data once and creates a task for each combination of fold and parameter combination (source).
Shuffling per parameter combination is probably not desirable anyway: the selection might then just pick the "easiest" split instead of the "best" parameter. If you think you are overfitting to the validation folds, then consider using
scoring
callable that customizes evaluation*my favorite among these, although the computation cost may be too high