I am making a grid search for tuning hyperparameters of a stacking estimator(StackingClassifier object from sklearn.ensemble library). I making use of the scikit library for ML, and the RandomizedSearchCV function. In adition to this, the base estimators of the stack to tune are pipelines (Pipeline object from imblearn.pipeline library) where the first step of each pipeline is a ColumnSelector object from the mlxtend library. The grid search is intended to look over a long list of combinations of variables, so the distribution of parameters for the grid goes only over the parameters "cols" for the ColumnSelector object. The first time I ran this code, everything was working well, then I set aside the project and come back after a few days to find it was not working anymore. Everything in the code is the same as I left it, but when I ran the method fit on the RandomizedSearchCV object, I get the following error:
AttributeError: 'ColumnSelector' object has no attribute 'n_features_in_'
I don't get what's worng. I have tried many things, even unninstalling Anaconda, mlxtend, imblearn, and reinstalling with the recent versions, but it keeps shouting the same error. I have made a search on google but it seems there is no info about this.
Can you help me with this issue?
Thanks in advance.
Addendum: the scikit version is 0.23.1, mlxtend version 0.17.3 and imbalanced-learn version is 0.7.0.
The full traceback is below, the object gr2 corresponds to a RandomizedSearchCV object which is intended to tune the stacking classifier. I want to note that if I make use of the StackingClassifier object from the mlxtend everything works fine, but this object does not have the parameter cv, which does have the StackingClassifier from sklearn.ensemble, and which I need in order to have better performance(which I had before when everything was working fine).
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-94-9d8f412d45a3> in <module>
----> 1 gr2.fit(x_train,y_train)
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73 return f(**kwargs)
74 return inner_f
75
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
763 refit_start_time = time.time()
764 if y is not None:
--> 765 self.best_estimator_.fit(X, y, **fit_params)
766 else:
767 self.best_estimator_.fit(X, **fit_params)
~\anaconda3\lib\site-packages\sklearn\ensemble\_stacking.py in fit(self, X, y, sample_weight)
423 self._le = LabelEncoder().fit(y)
424 self.classes_ = self._le.classes_
--> 425 return super().fit(X, self._le.transform(y), sample_weight)
426
427 @if_delegate_has_method(delegate='final_estimator_')
~\anaconda3\lib\site-packages\sklearn\ensemble\_stacking.py in fit(self, X, y, sample_weight)
147 for est in all_estimators if est != 'drop'
148 )
--> 149 self.n_features_in_ = self.estimators_[0].n_features_in_
150
151 self.named_estimators_ = Bunch()
~\anaconda3\lib\site-packages\sklearn\pipeline.py in n_features_in_(self)
623 def n_features_in_(self):
624 # delegate to first step (which will call _check_is_fitted)
--> 625 return self.steps[0][1].n_features_in_
626
627 def _sk_visual_block_(self):
AttributeError: 'ColumnSelector' object has no attribute 'n_features_in_'
sklearn
has been adding checks for the number of features, with the attribute n_features_in_
. It appears mlxtend
has not yet added that to its ColumnSelector
, and hence the error (noting that sklearn
's Pipeline
doesn't have its own attribute n_features_in_
, instead delegating to the first step, as you can see in the comment in the code at the end of the traceback).
Ideally, submit an Issue with mlxtend
to add n_features_in_
(and perhaps relevant checks) to ColumnSelector
. But in the meantime, a couple of workarounds come to mind:
mlxtend
has a StackingClassifierCV
, which is probably preferred to the ordinary StackingClassifier
anyway, and has the cv
parameter you want. That might never look for the n_features_in_
attribute and resolve things (as long as the Pipeline
never tries to call its getter...)sklearn
's ColumnTransformer
may be preferable to using mlxtend
's ColumnSelector
. Then you don't need mlxtend
at all, it seems.sklearn
may be enough, to avoid the n_features_in_
checks altogether.