machine-learning scikit-learn ensemble-learning mlxtend

Does the number of classifiers on stacking classifier have to be equal to the number of columns of my training/testing dataset?

I'm trying to solve a binary classification task. The training data set contains 9 features and after my feature engineering I ended having 14 features. I want to use a stacking classifier approach with mlxtend.classifier.StackingClassifier by using 4 different classifiers, but when trying to predict the test datata set I got the error: ValueError: query data dimension must match training data dimension

%%time
models=[KNeighborsClassifier(weights='distance'),
        GaussianNB(),SGDClassifier(loss='hinge'),XGBClassifier()]
calibrated_models=Calibrated_classifier(models,return_names=False)
meta=LogisticRegression()
stacker=StackingCVClassifier(classifiers=calibrated_models,meta_classifier=meta,use_probas=True).fit(X.values,y.values)

Remark: In my code I just programmed a function to return a list with calibrated classifiers StackingCVClassifier I have checked this is not causing the error

Remark 2: I had already tried to perform a stacker from scratch with the same results so I had thought It was something wrong with my own stacker

from sklearn.linear_model import LogisticRegression
def StackingClassifier(X,y,models,stacker=LogisticRegression(),return_data=True):
  names,ls=[],[]
  predictions=pd.DataFrame()
  for model in models:
    names.append(str(model)[:str(model).find('(')])

  for i,model in enumerate(models):
    model.fit(X,y)
    ls=model.predict_proba(X)[:,1]
    predictions[names[i]]=ls
  if return_data:
    return predictions
  else:
    return stacker.fit(predictions,y)

Could you please help me to understand the correct usage of a stacking classifiers?

EDIT: This is my code for calibrated classifier. This function takes a list of n classifiers and apply sklearn fucntion CalibratedClassifierCV to each one and returns a list with n calibrated classifiers. You have an option to return as a zip list since this function is mainly intended to be used along with sklearn's VotingClassifier

def Calibrated_classifier(models,method='sigmoid',return_names=True):
  calibrated,names=[],[]
  for model in models:
    names.append(str(model)[:str(model).find('(')])

  for model in models:
    clf=CalibratedClassifierCV(base_estimator=model,method=method)
    calibrated.append(clf)
  if return_names:
    return zip(names,calibrated)
  else: 
    return calibrated

Solution

I have tried your code with Iris dataset. It is working fine, I think the problem is with the dimension of your test data and not with the calibration.

from sklearn.linear_model import LogisticRegression
from mlxtend.classifier import StackingCVClassifier
from sklearn import datasets
X, y = datasets.load_iris(return_X_y=True)


models=[KNeighborsClassifier(weights='distance'),
        SGDClassifier(loss='hinge')]
calibrated_models=Calibrated_classifier(models,return_names=False)
meta=LogisticRegression( multi_class='ovr')
stacker = StackingCVClassifier(classifiers=calibrated_models,
                               meta_classifier=meta,use_probas=True,cv=3).fit(X,y)

Prediction

stacker.predict([X[0]])
#array([0])