I see that in gridsearchcv best parameters are determined based on cross-validation, but what I really want to do is to determine the best parameters based on one held out validation set instead of cross validation.
Not sure if there is a way to do that. I found some similar posts where customizing the cross-validation folds. However, again what I really need is to train on one set and validate the parameters on a validation set.
One more information about my dataset is basically a text series type created by panda.
I did come up with an answer to my own question through the use of PredefinedSplit
for i in range(len(doc_train)-1):
train_ind[i] = -1
for i in range(len(doc_val)-1):
val_ind[i] = 0
ps = PredefinedSplit(test_fold=np.concatenate((train_ind,val_ind)))
and then in the gridsearchCV arguments
grid_search = GridSearchCV(pipeline, parameters, n_jobs=7, verbose=1 , cv=ps)