pythonscikit-learnyellowbrick

AttributeError: 'KMeans' object has no attribute 'k'


I know there is an exactly similar question [here]. But it doesn't work for me, and another guy found it incorrect, as stated in the answer's comment. Although, the OP (who also answered his question) hasn't responded to the comment to explain more. Hence, I'm asking this again to get help.

What I tried:

  1. changing k to n_cluster or n_clusters (as said in the similar question's comments section) neither solves the issue nor changes the error!!
  2. "storing the result of the elbow_method function into a variable" doesn't work (see my code)

I'm trying to find the optimal number of clusters in KMeans clustering using the silhouette coefficient by KElbowVisualizer. Suppose this is the train data:

import numpy as np

data = np.array([[146162.56679954],
       [137227.54181954],
       [126450.29169228],
       [119435.56512675],
       [114988.18682806],
       [111546.74599395],
       [111521.9739634 ],
       [110335.78734103],
       [105098.20650161],
       [ 99178.48409528],
       [ 93982.20860075],
       [ 91453.21097512],
       [ 94160.32926255],
       [102299.29173218],
       [114540.38664748],
       [122133.18759654],
       [121756.94400854],
       [118709.47518003],
       [119216.20443483],
       [122172.5736574 ],
       [122433.8120907 ],
       [120599.22092939],
       [118789.73304299],
       [119107.28063106],
       [123920.58809778],
       [128772.96569855],
       [131502.10371984],
       [129525.67885428],
       [123411.68604418],
       [120263.05106831],
       [114844.47942828],
       [108214.07115472],
       [101822.69619871],
       [ 94871.33385049],
       [ 91251.9375137 ],
       [ 90058.80745747],
       [ 93606.20700239],
       [101044.76675943],
       [109125.2713446 ],
       [112272.386321  ],
       [104429.87179175],
       [ 90827.50408907],
       [ 80805.43033707],
       [ 76165.48417937],
       [ 75002.04576279],
       [ 75428.52404817],
       [ 77444.72355588],
       [ 80389.43621805],
       [ 83401.15424418],
       [ 87638.20462011]])

And the following code is related to finding the optimal number of clusters:

from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer

# Here, I store the results in a variable named `visualizer` (Since the OP of the similar question said in his answer)
visualizer = KElbowVisualizer(KMeans(), k=11, metric='silhouette', timings= True)
visualizer.fit(data)

I get this error:

AttributeError                            Traceback (most recent call last)
File ~\Anaconda3\envs\Python3.10\lib\site-packages\IPython\core\formatters.py:343, in BaseFormatter.__call__(self, obj)
    341     method = get_real_method(obj, self.print_method)
    342     if method is not None:
--> 343         return method()
    344     return None
    345 else:

File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\base.py:625, in BaseEstimator._repr_html_inner(self)
    620 def _repr_html_inner(self):
    621     """This function is returned by the @property `_repr_html_` to make
    622     `hasattr(estimator, "_repr_html_") return `True` or `False` depending
    623     on `get_config()["display"]`.
    624     """
--> 625     return estimator_html_repr(self)

File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\utils\_estimator_html_repr.py:385, in estimator_html_repr(estimator)
    383 style_template = Template(_STYLE)
    384 style_with_id = style_template.substitute(id=container_id)
--> 385 estimator_str = str(estimator)
    387 # The fallback message is shown by default and loading the CSS sets
    388 # div.sk-text-repr-fallback to display: none to hide the fallback message.
    389 #
   (...)
    394 # The reverse logic applies to HTML repr div.sk-container.
    395 # div.sk-container is hidden by default and the loading the CSS displays it.
    396 fallback_msg = (
    397     "In a Jupyter environment, please rerun this cell to show the HTML"
    398     " representation or trust the notebook. <br />On GitHub, the"
    399     " HTML representation is unable to render, please try loading this page"
    400     " with nbviewer.org."
    401 )

File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\base.py:279, in BaseEstimator.__repr__(self, N_CHAR_MAX)
    271 # use ellipsis for sequences with a lot of elements
    272 pp = _EstimatorPrettyPrinter(
    273     compact=True,
    274     indent=1,
    275     indent_at_name=True,
    276     n_max_elements_to_show=N_MAX_ELEMENTS_TO_SHOW,
    277 )
--> 279 repr_ = pp.pformat(self)
    281 # Use bruteforce ellipsis when there are a lot of non-blank characters
    282 n_nonblank = len("".join(repr_.split()))

File ~\Anaconda3\envs\Python3.10\lib\pprint.py:157, in PrettyPrinter.pformat(self, object)
    155 def pformat(self, object):
    156     sio = _StringIO()
--> 157     self._format(object, sio, 0, 0, {}, 0)
    158     return sio.getvalue()

File ~\Anaconda3\envs\Python3.10\lib\pprint.py:174, in PrettyPrinter._format(self, object, stream, indent, allowance, context, level)
    172     self._readable = False
    173     return
--> 174 rep = self._repr(object, context, level)
    175 max_width = self._width - indent - allowance
    176 if len(rep) > max_width:

File ~\Anaconda3\envs\Python3.10\lib\pprint.py:454, in PrettyPrinter._repr(self, object, context, level)
    453 def _repr(self, object, context, level):
--> 454     repr, readable, recursive = self.format(object, context.copy(),
    455                                             self._depth, level)
    456     if not readable:
    457         self._readable = False

File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\utils\_pprint.py:189, in _EstimatorPrettyPrinter.format(self, object, context, maxlevels, level)
    188 def format(self, object, context, maxlevels, level):
--> 189     return _safe_repr(
    190         object, context, maxlevels, level, changed_only=self._changed_only
    191     )

File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\utils\_pprint.py:440, in _safe_repr(object, context, maxlevels, level, changed_only)
    438 recursive = False
    439 if changed_only:
--> 440     params = _changed_params(object)
    441 else:
    442     params = object.get_params(deep=False)

File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\utils\_pprint.py:93, in _changed_params(estimator)
     89 def _changed_params(estimator):
     90     """Return dict (param_name: value) of parameters that were given to
     91     estimator with non-default values."""
---> 93     params = estimator.get_params(deep=False)
     94     init_func = getattr(estimator.__init__, "deprecated_original", estimator.__init__)
     95     init_params = inspect.signature(init_func).parameters

File ~\Anaconda3\envs\Python3.10\lib\site-packages\yellowbrick\base.py:342, in ModelVisualizer.get_params(self, deep)
    334 def get_params(self, deep=True):
    335     """
    336     After v0.24 - scikit-learn is able to determine that ``self.estimator`` is
    337     nested and fetches its params using ``estimator__param``. This functionality is
   (...)
    340     the estimator params.
    341     """
--> 342     params = super(ModelVisualizer, self).get_params(deep=deep)
    343     for param in list(params.keys()):
    344         if param.startswith("estimator__"):

File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\base.py:211, in BaseEstimator.get_params(self, deep)
    209 out = dict()
    210 for key in self._get_param_names():
--> 211     value = getattr(self, key)
    212     if deep and hasattr(value, "get_params"):
    213         deep_items = value.get_params().items()

File ~\Anaconda3\envs\Python3.10\lib\site-packages\yellowbrick\utils\wrapper.py:42, in Wrapper.__getattr__(self, attr)
     40 def __getattr__(self, attr):
     41     # proxy to the wrapped object
---> 42     return getattr(self._wrapped, attr)

AttributeError: 'KMeans' object has no attribute 'k'

Plus a figure(continued of error!):
enter image description here The interesting thing is it renders a plot that I didn't request at all! Also, I set timings= True, but there isn't any timing information on the plot! So maybe this means the algorithm didn't run at all (but I don't know how it renders the result!). So I wonder, Where is the problem?

Additional information:
yellowbrick version = 1.4
scikitlearn version = 1.1.1

I also tried these in VSCode and Jupyter Notebook (anaconda). The results are the same.


Solution

  • I tried something, But I need help understanding why I should define a class for this purpose. Code:

    class find_n_cluster:
        def __init__(self, train_data, metric) -> None:
            self.data = train_data
            self.metric = metric
            self.visualizer = KElbowVisualizer(KMeans(), k=11, metric=self.metric, timings= True)
    
        def fit(self):
            self.visualizer.fit(self.data)
    
    model = find_n_cluster(data, 'silhouette')
    model.fit()
    

    Result (which is unexpected because of the plot):
    enter image description here

    Again, I didn't even request any plot! Also, I don't know how defining a class for this job solves the error!!


    Update
    I solved the issue related to showing the plot by adding plt.close() after self.visualizer.fit(self.data):

    class find_n_cluster:
        def __init__(self, train_data, metric) -> None:
            self.data = train_data
            self.metric = metric
            self.visualizer = KElbowVisualizer(KMeans(), k=11, metric=self.metric, timings= True)
    
        def fit(self):
            self.visualizer.fit(self.data)
            plt.close()
    
    model = find_n_cluster(data, 'silhouette')
    model.fit()
    

    And this doesn't return and show anything as expected. But still, I don't know how defining the class solved the error.


    Update 8/23/2022

    Based on this, yellowbrick got updated to v1.5, and the problem with AttributeError: 'KMeans' object has no attribute 'k' got solved. But the problem with automatically displaying the plot in the Jupyter Notebook is still persistent.