I have a bunch of tabular data and I managed to train a RandomForestClassifier
, a GradientBoostingClassifier
and a deep learning model (the tabular learner from fastai
) with them. I noticed in the results that every model does better then the others in a particular label, different for every model. I was wondering if I could put all the models into a VotingClassifier
(the one from sklearn
). I have no problem with the RandomForestClassifier
and the GradientBoostingClassifier
but I didn’t find anything about putting the tabular learner inside the VotingClassifier
. Is it possible to do that?
Should be possible with a wrapper which defines __init__
, fit
and predict_proba
which are required by the VotingClassifier
:
from fastai.tabular.all import *
from sklearn.base import BaseEstimator, ClassifierMixin
class FastAITabularClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, dls, layers, metrics):
self.dls = dls
self.layers = layers
self.metrics = metrics
self.learn = None
def fit(self, X, y):
dls = self.dls.new(X, y)
self.learn = tabular_learner(dls, layers=self.layers, metrics=self.metrics)
self.learn.fit_one_cycle(5)
return self
def predict_proba(self, X):
dl = self.dls.test_dl(X, with_labels=False)
preds, _ = self.learn.get_preds(dl=dl)
return preds.numpy()
Then you should be able to use it with VotingClassifier
:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
rf = RandomForestClassifier(n_estimators=100)
gbm = GradientBoostingClassifier(n_estimators=100)
fastai_model = FastAITabularClassifier(dls, layers=[200,100], metrics=accuracy)
voting_clf = VotingClassifier(estimators=[
('rf', rf),
('gbm', gbm),
('fastai', fastai_model)
], voting='soft')
voting_clf.fit(X_train, y_train)