pythonpandasdataframescikit-learnadaboost

How to use AdaBoost on multiple different types of fitted classifiers (like SVM, Decision Tree, Neural Network, etc.)?


I'm working on a classification problem and have multiple fitted sklearn classifiers, like

svm = SVC().fit(X_train, y_train)
dt = tree.DecisionTreeClassifier(criterion='entropy',max_depth=4000).fit(X_train, y_train)

...

for i in range(num_of_models):
    m2 = create_model_for_ensemble(dummy_y_train.shape[1])
    m2.fit(X_train_array[i], dummy_y_train, epochs=150, batch_size=100, verbose=0)
    models.append(m2)
# m2 is a customized Neural Network Classifier, that has a custom predict function (m2.predict_classes)
# The above code is just an example, the point is - m2 is also a classifier.

... etc.

Initially, these all get the same inputs, and all have the same type of outputs, they can all predict a label for a row of my data:

  label attribute_1 attribute_2  ... attribute_79
1     ?    0.199574    0.203156  ...     0.046898   
2     ?    0.201461    0.203837  ...     0.075002   
3     ?    0.209044    0.214268  ...     0.143278
...   ...       ...         ...  ...          ...

Where label is a whole number ranging from 0 to 29.

My goal is to build an AdaBoost classifier that includes all of the above (svm, dt, m2), but I haven't been able to find an example on Google; every example just talks about multiple different decision trees, or multiple different (but the same type of) classifiers.

I know it can be done, for each row (or datapoint) of my dataframe, the weights of each classifier have to be adjusted, and that doesn't require for all of them to be the same type of classifier - they all just need to have a .predict method.

So how do I go about doing this? Can anyone give me an example?


Solution

  • To include all clf [svm, dt, m2], create an ensemble model at first stage and then feed this ensemble model as the base estimator for adaboost.

    Try some thing similar like this!

    from sklearn import datasets
    from sklearn.ensemble import AdaBoostClassifier,VotingClassifier
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.svm import SVC
    
    
    from sklearn.model_selection import train_test_split
    
    X_train, xtest, y_train, y_eval = train_test_split(X, y, test_size=0.2, random_state=42)
    
    iris = datasets.load_iris()
    X, y = iris.data[:, 1:3], iris.target
    
    votingClf = VotingClassifier([('clf1',SVC(probability=True)),('clf2',DecisionTreeClassifier())],voting='soft') #
    
    adaBoostClassifier = AdaBoostClassifier(base_estimator = votingClf)
    adaBoostClassifier.fit(X,y)