I'm trying to follow the example from the link below.
All the code up to this point works, but the code below does not work.
from sklearn.decomposition import LatentDirichletAllocation
vectorizer = CountVectorizer(
analyzer='word',
min_df=3,# minimum required occurences of a word
stop_words='english',# remove stop words
lowercase=True,# convert all words to lowercase
token_pattern='[a-zA-Z0-9]{3,}',# num chars > 3
max_features=5000,# max number of unique words
)
data_matrix = vectorizer.fit_transform(df_clean['question_lemmatize_clean'])
lda_model = LatentDirichletAllocation(
n_components=10, # Number of topics
learning_method='online',
random_state=20,
n_jobs = -1 # Use all available CPUs
)
lda_output = lda_model.fit_transform(data_matrix)
import pyLDAvis
import pyLDAvis.sklearn
pyLDAvis.enable_notebook()
pyLDAvis.sklearn.prepare(lda_model, data_matrix, vectorizer, mds='tsne')
When I run that code snippet, I get this error message.
AttributeError Traceback (most recent call last)
Cell In[83], line 29
27 import pyLDAvis.sklearn
28 pyLDAvis.enable_notebook()
---> 29 pyLDAvis.sklearn.prepare(lda_model, data_matrix, vectorizer, mds='tsne')
File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:94, in prepare(lda_model, dtm, vectorizer, **kwargs)
62 def prepare(lda_model, dtm, vectorizer, **kwargs):
63 """Create Prepared Data from sklearn's LatentDirichletAllocation and CountVectorizer.
64
65 Parameters
(...)
92 See `pyLDAvis.prepare` for **kwargs.
93 """
---> 94 opts = fp.merge(_extract_data(lda_model, dtm, vectorizer), kwargs)
95 return pyLDAvis.prepare(**opts)
File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:38, in _extract_data(lda_model, dtm, vectorizer)
37 def _extract_data(lda_model, dtm, vectorizer):
---> 38 vocab = _get_vocab(vectorizer)
39 doc_lengths = _get_doc_lengths(dtm)
40 term_freqs = _get_term_freqs(dtm)
File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:20, in _get_vocab(vectorizer)
19 def _get_vocab(vectorizer):
---> 20 return vectorizer.get_feature_names()
AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'
I feel like, perhaps, some library is not updated correctly, but I can't tell, and when I Google it, I'm not getting great results to help me debug this thing. Anyone know what's wrong here?
The method get_feature_names()
has been changed to get_feature_names_out()
and the purpose of it is to help get output feature names for transformation.
Link to the documentation: here