pythonpython-3.xtopic-modelingcountvectorizerlatentdirichletallocation

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names' -- Topic Modeling -- Latent Dirichlet Allocation


I'm trying to follow the example from the link below.

https://medium.datadriveninvestor.com/trump-tweets-topic-modeling-using-latent-dirichlet-allocation-e4f93b90b6fe

All the code up to this point works, but the code below does not work.

from sklearn.decomposition import LatentDirichletAllocation
vectorizer = CountVectorizer(
            analyzer='word',       
            min_df=3,# minimum required occurences of a word 
            stop_words='english',# remove stop words
            lowercase=True,# convert all words to lowercase
            token_pattern='[a-zA-Z0-9]{3,}',# num chars > 3
            max_features=5000,# max number of unique words
            )


data_matrix = vectorizer.fit_transform(df_clean['question_lemmatize_clean'])

                                                                    
lda_model = LatentDirichletAllocation(
            n_components=10, # Number of topics
            learning_method='online',
            random_state=20,       
            n_jobs = -1  # Use all available CPUs
            )
    
    
lda_output = lda_model.fit_transform(data_matrix)
                                                                    

import pyLDAvis
import pyLDAvis.sklearn
pyLDAvis.enable_notebook()
pyLDAvis.sklearn.prepare(lda_model, data_matrix, vectorizer, mds='tsne')    

When I run that code snippet, I get this error message.

AttributeError                            Traceback (most recent call last)
Cell In[83], line 29
     27 import pyLDAvis.sklearn
     28 pyLDAvis.enable_notebook()
---> 29 pyLDAvis.sklearn.prepare(lda_model, data_matrix, vectorizer, mds='tsne')

File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:94, in prepare(lda_model, dtm, vectorizer, **kwargs)
     62 def prepare(lda_model, dtm, vectorizer, **kwargs):
     63     """Create Prepared Data from sklearn's LatentDirichletAllocation and CountVectorizer.
     64 
     65     Parameters
   (...)
     92     See `pyLDAvis.prepare` for **kwargs.
     93     """
---> 94     opts = fp.merge(_extract_data(lda_model, dtm, vectorizer), kwargs)
     95     return pyLDAvis.prepare(**opts)

File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:38, in _extract_data(lda_model, dtm, vectorizer)
     37 def _extract_data(lda_model, dtm, vectorizer):
---> 38     vocab = _get_vocab(vectorizer)
     39     doc_lengths = _get_doc_lengths(dtm)
     40     term_freqs = _get_term_freqs(dtm)

File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:20, in _get_vocab(vectorizer)
     19 def _get_vocab(vectorizer):
---> 20     return vectorizer.get_feature_names()

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

I feel like, perhaps, some library is not updated correctly, but I can't tell, and when I Google it, I'm not getting great results to help me debug this thing. Anyone know what's wrong here?


Solution

  • The method get_feature_names() has been changed to get_feature_names_out() and the purpose of it is to help get output feature names for transformation.

    Link to the documentation: here