pythonanacondadata-sciencetopic-modeling

Getting an error from hdbscan while importing bertopic


I'm trying to import bertopic but it gives the following error. I tried different versions and re create a new environment. But it's still same. I'm using Apple M2 Pro processor

lib version
BERTopic 0.15.0
HDBSCAN 0.8.29
umap-learn 0.5.3 

TypeError                                 Traceback (most recent call last)
Cell In[3], line 4
      2 import pandas as pd
      3 # import matplotlib.pyplot as plt
----> 4 from bertopic import BERTopic
      5 import gensim
      6 import gensim.corpora as corpora

File ~/miniforge3/envs/bertopic/lib/python3.8/site-packages/bertopic/__init__.py:1
----> 1 from bertopic._bertopic import BERTopic
      3 __version__ = "0.15.0"
      5 __all__ = [
      6     "BERTopic",
      7 ]

File ~/miniforge3/envs/bertopic/lib/python3.8/site-packages/bertopic/_bertopic.py:37
     34 from typing import List, Tuple, Union, Mapping, Any, Callable, Iterable
     36 # Models
---> 37 import hdbscan
     38 from umap import UMAP
     39 from sklearn.preprocessing import normalize

File ~/miniforge3/envs/bertopic/lib/python3.8/site-packages/hdbscan/__init__.py:1
----> 1 from .hdbscan_ import HDBSCAN, hdbscan
      2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage
      3 from .validity import validity_index

File ~/miniforge3/envs/bertopic/lib/python3.8/site-packages/hdbscan/hdbscan_.py:40
     37 from .plots import CondensedTree, SingleLinkageTree, MinimumSpanningTree
     38 from .prediction import PredictionData
---> 40 FAST_METRICS = KDTree.valid_metrics + BallTree.valid_metrics + ["cosine", "arccos"]
     42 # Author: Leland McInnes <leland.mcinnes@gmail.com>
     43 #         Steve Astels <sastels@gmail.com>
     44 #         John Healy <jchealy@gmail.com>
     45 #
     46 # License: BSD 3 clause
     47 from numpy import isclose

TypeError: unsupported operand type(s) for +: 'builtin_function_or_method' and 'builtin_function_or_method'

Solution

  • This is tracked in a Github issue and is due to a change in scikit-learn.

    To avoid this bug, you could install scikit-learn < 1.3.0. (pip install -U scikit-learn==1.2.2) The creator of HDBScan has a fix in mind, so likely at some point this will work with later scikit-learn versions.