I am using a LeaveOutGroupOut CV strategy with TPOTRegressor
from tpot import TPOTRegressor
from sklearn.model_selection import LeaveOneGroupOut
tpot = TPOTRegressor(
config_dict=regressor_config_dict,
generations=100,
population_size=100,
cv=LeaveOneGroupOut(),
verbosity=2,
n_jobs=1)
tpot.fit(XX, yy, groups=groups)
After optimization the best scoring trained pipeline is stored in tpot.fitted_pipeline_
and tpot.fitted_pipeline_.predict(X)
is available.
my question is: what will the fitted pipeline have been trained on? e.g.
tpot.fitted_pipeline_
?Additionally, is there a way to access the complete set of trained models corresponding to the set of splits for the winning/optimized pipeline?
TPOT will fit the final 'best' pipeline on the full training set: code
It's therefore recommended that your testing data never be passed to the TPOT fit function if you plan to directly interact with the 'best' pipeline via the TPOT object.
If that is an issue for you, you can retrain the pipeline directly via the tpot.fitted_pipeline_
attribute, which is simply a sklearn Pipeline object. Alternatively, you can use the export
function to export the 'best' pipeline to its corresponding Python code and interact with the pipeline outside of TPOT.
Additionally, is there a way to access the complete set of trained models corresponding to the set of splits for the winning/optimized pipeline?
No. TPOT uses sklearn's cross_val_score
when evaluating pipelines, so it throws out the set of trained pipelines from the CV process. However, you can access the scoring results of every pipeline that TPOT evaluated via the tpot.evaluated_individuals_
attribute.