pythonmachine-learningscikit-learnensemble-learningfast-ai

How to integrate FastAI classifier into sklearn VotingClassifier?


I have a bunch of tabular data and I managed to train a RandomForestClassifier, a GradientBoostingClassifier and a deep learning model (the tabular learner from fastai) with them. I noticed in the results that every model does better then the others in a particular label, different for every model. I was wondering if I could put all the models into a VotingClassifier (the one from sklearn). I have no problem with the RandomForestClassifier and the GradientBoostingClassifier but I didn’t find anything about putting the tabular learner inside the VotingClassifier. Is it possible to do that?


Solution

  • Should be possible with a wrapper which defines __init__, fit and predict_proba which are required by the VotingClassifier:

    from fastai.tabular.all import *
    from sklearn.base import BaseEstimator, ClassifierMixin
    
    class FastAITabularClassifier(BaseEstimator, ClassifierMixin):
        def __init__(self, dls, layers, metrics):
            self.dls = dls
            self.layers = layers
            self.metrics = metrics
            self.learn = None
    
        def fit(self, X, y):
            dls = self.dls.new(X, y)
            self.learn = tabular_learner(dls, layers=self.layers, metrics=self.metrics)
            self.learn.fit_one_cycle(5)
            return self
    
        def predict_proba(self, X):
            dl = self.dls.test_dl(X, with_labels=False)
            preds, _ = self.learn.get_preds(dl=dl)
            return preds.numpy()
    

    Then you should be able to use it with VotingClassifier:

    from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
    
    rf = RandomForestClassifier(n_estimators=100)
    gbm = GradientBoostingClassifier(n_estimators=100)
    
    fastai_model = FastAITabularClassifier(dls, layers=[200,100], metrics=accuracy)
    
    voting_clf = VotingClassifier(estimators=[
        ('rf', rf),
        ('gbm', gbm),
        ('fastai', fastai_model)
    ], voting='soft')
    
    voting_clf.fit(X_train, y_train)