pythonscikit-learndivide-by-zero

ZeroDivisionError when using sklearn's BaggingClassifier with GridSearchCV


I'm trying to improve a perfectly working Bernoulli Naive Bayes model with bagging.

But when I try to cross-validate the BaggingClassifier, I get a very unexpected ZeroDivisionError coming from parallel.py.

I've tried to change all the parameters I know, rebooted python but nothing worked.

Here is a reproducible example with a binary-modified iris dataset:

#%% run
import numpy as np

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import BaggingClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.datasets import load_iris


data = load_iris()
data.targetbin = (data.target!=0).astype("int")




param_grid2={'max_samples' : np.linspace(0.5,1.0,3),
            'base_estimator__alpha':np.linspace(0.1,1,3),
            'base_estimator__binarize':[*np.linspace(0.0,1,3)],
            'base_estimator__fit_prior':[True,False]}


param_grid2={'max_samples' :[0.7]}


clf = GridSearchCV(
        BaggingClassifier(
                BernoulliNB(),
                n_estimators = 10, max_features = 0.5),
        param_grid2,
        scoring = "accuracy",
        verbose=-1)


clf.fit(data.data, data.targetbin)

And here is the stacktrace of my error:

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. Traceback (most recent call last):

  File "<ipython-input-1-dc4eaed2671b>", line 33, in <module>
    clf.fit(data.data, data.targetbin)

  File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 722, in fit
    self._run_search(evaluate_candidates)

  File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 1191, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))

  File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 711, in evaluate_candidates
    cv.split(X, y, groups)))

  File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 917, in __call__
    if self.dispatch_one_batch(iterator):

  File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)

  File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)

  File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 184, in apply_async
    callback(result)

  File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 306, in __call__
    self.parallel.print_progress()

  File "C:\Users\Dan\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 806, in print_progress
    if (is_last_item or cursor % frequency):

ZeroDivisionError: integer division or modulo by zero

What am I doing wrong?


Solution

  • I tried to debug the lib and found self.verbose for sklearn/externals/joblib/parallel.py is -1, however it's supposed to be at least 0 by default. So I think it's a bug.