I know there is an exactly similar question [here]. But it doesn't work for me, and another guy found it incorrect, as stated in the answer's comment. Although, the OP (who also answered his question) hasn't responded to the comment to explain more. Hence, I'm asking this again to get help.
What I tried:
k
to n_cluster
or n_clusters
(as said in the similar question's comments section) neither solves the issue nor changes the error!!I'm trying to find the optimal number of clusters in KMeans
clustering using the silhouette coefficient by KElbowVisualizer
. Suppose this is the train data:
import numpy as np
data = np.array([[146162.56679954],
[137227.54181954],
[126450.29169228],
[119435.56512675],
[114988.18682806],
[111546.74599395],
[111521.9739634 ],
[110335.78734103],
[105098.20650161],
[ 99178.48409528],
[ 93982.20860075],
[ 91453.21097512],
[ 94160.32926255],
[102299.29173218],
[114540.38664748],
[122133.18759654],
[121756.94400854],
[118709.47518003],
[119216.20443483],
[122172.5736574 ],
[122433.8120907 ],
[120599.22092939],
[118789.73304299],
[119107.28063106],
[123920.58809778],
[128772.96569855],
[131502.10371984],
[129525.67885428],
[123411.68604418],
[120263.05106831],
[114844.47942828],
[108214.07115472],
[101822.69619871],
[ 94871.33385049],
[ 91251.9375137 ],
[ 90058.80745747],
[ 93606.20700239],
[101044.76675943],
[109125.2713446 ],
[112272.386321 ],
[104429.87179175],
[ 90827.50408907],
[ 80805.43033707],
[ 76165.48417937],
[ 75002.04576279],
[ 75428.52404817],
[ 77444.72355588],
[ 80389.43621805],
[ 83401.15424418],
[ 87638.20462011]])
And the following code is related to finding the optimal number of clusters:
from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer
# Here, I store the results in a variable named `visualizer` (Since the OP of the similar question said in his answer)
visualizer = KElbowVisualizer(KMeans(), k=11, metric='silhouette', timings= True)
visualizer.fit(data)
I get this error:
AttributeError Traceback (most recent call last)
File ~\Anaconda3\envs\Python3.10\lib\site-packages\IPython\core\formatters.py:343, in BaseFormatter.__call__(self, obj)
341 method = get_real_method(obj, self.print_method)
342 if method is not None:
--> 343 return method()
344 return None
345 else:
File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\base.py:625, in BaseEstimator._repr_html_inner(self)
620 def _repr_html_inner(self):
621 """This function is returned by the @property `_repr_html_` to make
622 `hasattr(estimator, "_repr_html_") return `True` or `False` depending
623 on `get_config()["display"]`.
624 """
--> 625 return estimator_html_repr(self)
File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\utils\_estimator_html_repr.py:385, in estimator_html_repr(estimator)
383 style_template = Template(_STYLE)
384 style_with_id = style_template.substitute(id=container_id)
--> 385 estimator_str = str(estimator)
387 # The fallback message is shown by default and loading the CSS sets
388 # div.sk-text-repr-fallback to display: none to hide the fallback message.
389 #
(...)
394 # The reverse logic applies to HTML repr div.sk-container.
395 # div.sk-container is hidden by default and the loading the CSS displays it.
396 fallback_msg = (
397 "In a Jupyter environment, please rerun this cell to show the HTML"
398 " representation or trust the notebook. <br />On GitHub, the"
399 " HTML representation is unable to render, please try loading this page"
400 " with nbviewer.org."
401 )
File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\base.py:279, in BaseEstimator.__repr__(self, N_CHAR_MAX)
271 # use ellipsis for sequences with a lot of elements
272 pp = _EstimatorPrettyPrinter(
273 compact=True,
274 indent=1,
275 indent_at_name=True,
276 n_max_elements_to_show=N_MAX_ELEMENTS_TO_SHOW,
277 )
--> 279 repr_ = pp.pformat(self)
281 # Use bruteforce ellipsis when there are a lot of non-blank characters
282 n_nonblank = len("".join(repr_.split()))
File ~\Anaconda3\envs\Python3.10\lib\pprint.py:157, in PrettyPrinter.pformat(self, object)
155 def pformat(self, object):
156 sio = _StringIO()
--> 157 self._format(object, sio, 0, 0, {}, 0)
158 return sio.getvalue()
File ~\Anaconda3\envs\Python3.10\lib\pprint.py:174, in PrettyPrinter._format(self, object, stream, indent, allowance, context, level)
172 self._readable = False
173 return
--> 174 rep = self._repr(object, context, level)
175 max_width = self._width - indent - allowance
176 if len(rep) > max_width:
File ~\Anaconda3\envs\Python3.10\lib\pprint.py:454, in PrettyPrinter._repr(self, object, context, level)
453 def _repr(self, object, context, level):
--> 454 repr, readable, recursive = self.format(object, context.copy(),
455 self._depth, level)
456 if not readable:
457 self._readable = False
File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\utils\_pprint.py:189, in _EstimatorPrettyPrinter.format(self, object, context, maxlevels, level)
188 def format(self, object, context, maxlevels, level):
--> 189 return _safe_repr(
190 object, context, maxlevels, level, changed_only=self._changed_only
191 )
File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\utils\_pprint.py:440, in _safe_repr(object, context, maxlevels, level, changed_only)
438 recursive = False
439 if changed_only:
--> 440 params = _changed_params(object)
441 else:
442 params = object.get_params(deep=False)
File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\utils\_pprint.py:93, in _changed_params(estimator)
89 def _changed_params(estimator):
90 """Return dict (param_name: value) of parameters that were given to
91 estimator with non-default values."""
---> 93 params = estimator.get_params(deep=False)
94 init_func = getattr(estimator.__init__, "deprecated_original", estimator.__init__)
95 init_params = inspect.signature(init_func).parameters
File ~\Anaconda3\envs\Python3.10\lib\site-packages\yellowbrick\base.py:342, in ModelVisualizer.get_params(self, deep)
334 def get_params(self, deep=True):
335 """
336 After v0.24 - scikit-learn is able to determine that ``self.estimator`` is
337 nested and fetches its params using ``estimator__param``. This functionality is
(...)
340 the estimator params.
341 """
--> 342 params = super(ModelVisualizer, self).get_params(deep=deep)
343 for param in list(params.keys()):
344 if param.startswith("estimator__"):
File ~\Anaconda3\envs\Python3.10\lib\site-packages\sklearn\base.py:211, in BaseEstimator.get_params(self, deep)
209 out = dict()
210 for key in self._get_param_names():
--> 211 value = getattr(self, key)
212 if deep and hasattr(value, "get_params"):
213 deep_items = value.get_params().items()
File ~\Anaconda3\envs\Python3.10\lib\site-packages\yellowbrick\utils\wrapper.py:42, in Wrapper.__getattr__(self, attr)
40 def __getattr__(self, attr):
41 # proxy to the wrapped object
---> 42 return getattr(self._wrapped, attr)
AttributeError: 'KMeans' object has no attribute 'k'
Plus a figure(continued of error!):
The interesting thing is it renders a plot that I didn't request at all! Also, I set timings= True
, but there isn't any timing information on the plot! So maybe this means the algorithm didn't run at all (but I don't know how it renders the result!). So I wonder, Where is the problem?
Additional information:
yellowbrick
version = 1.4
scikitlearn
version = 1.1.1
I also tried these in VSCode and Jupyter Notebook (anaconda). The results are the same.
I tried something, But I need help understanding why I should define a class
for this purpose. Code:
class find_n_cluster:
def __init__(self, train_data, metric) -> None:
self.data = train_data
self.metric = metric
self.visualizer = KElbowVisualizer(KMeans(), k=11, metric=self.metric, timings= True)
def fit(self):
self.visualizer.fit(self.data)
model = find_n_cluster(data, 'silhouette')
model.fit()
Result (which is unexpected because of the plot):
Again, I didn't even request any plot!
Also, I don't know how defining a class
for this job solves the error!!
Update
I solved the issue related to showing the plot by adding plt.close()
after self.visualizer.fit(self.data)
:
class find_n_cluster:
def __init__(self, train_data, metric) -> None:
self.data = train_data
self.metric = metric
self.visualizer = KElbowVisualizer(KMeans(), k=11, metric=self.metric, timings= True)
def fit(self):
self.visualizer.fit(self.data)
plt.close()
model = find_n_cluster(data, 'silhouette')
model.fit()
And this doesn't return and show anything as expected. But still, I don't know how defining the class
solved the error.
Based on this, yellowbrick
got updated to v1.5
, and the problem with AttributeError: 'KMeans' object has no attribute 'k'
got solved. But the problem with automatically displaying the plot in the Jupyter Notebook is still persistent.