StackingClassifier in sklearn can stack several models. At the moment of the calling .fit
method, the underlying models are trained.
A typical use case for StackingClassifier:
model1 = LogisticRegression()
model2 = RandomForest()
combination = StackingClassifier([model1, model2])
combination.fit(X_train, y_train)
However, what I need is the following:
model1 = LogisticRegression()
model1.fit(X_train_1, y_train_1)
model2 = RandomForest()
model2.fit(X_train_2, y_train_2)
combination = StackingClassifier([model1, model2], refit=False)
combination.fit(X_train_3, y_train_3)
where refit
does not exist - it is what I would need.
I have already trained models model1,
and model2
and do not want to re-fit them. I need just to fit the stacking model that combines these two. How do I elegantly combine them into one model that would produce an end-to-end .predict
?
Of course, I can predict the first and the second model, create a data frame, and fit the third one. I would like to avoid that because then I cannot communicate the model as an end-to-end artifact.
You're close: it's cv="prefit"
, not refit=False
. From the API docs:
cv : int, cross-validation generator, iterable, or “prefit”, default=None
[...]
"prefit"
to assume theestimators
are prefit. In this case, the estimators will not be refitted.