[SOLVED] topic modeling from quotes

topic modeling from quotes

Based on the folloiwng link : quotes

with help of following code(this site was based on javascript, so first i have disabled it)

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.common.keys import Keys
browser =webdriver.Chrome()
browser.get("https://quotes.toscrape.com/")
elem = browser.find_elements(By.CLASS_NAME, 'author')  # Find the search box
quot_choosing =browser.find_elements(By.CLASS_NAME,'text')
autors=[]
quotes =[]
for  author in elem:
    autors.append(author.text)
for quote in quot_choosing:
    quotes.append(quote.text)
print(autors)
print(quotes)

autor_saying =pd.DataFrame({"Author":autors,"Quotes":quotes})
autor_saying.to_csv("quotes.csv",index=False)
print(autor_saying.head())
browser.quit()

i haved author's and quote's information in csv file and then read it as it is given it bellow :

import pandas as pd
from bertopic import BERTopic
model =BERTopic()

summarization =[]
data =pd.read_csv("quotes.csv")
print(data.head())
for  index, row in data.iterrows():
    topics, probs =model.fit_transform([row['Quotes']])
    print(topics)

here is result :

   Author                                             Quotes
0  Albert Einstein  “The world as we have created it is a process ...
1     J.K. Rowling  “It is our choices, Harry, that show what we t...
2  Albert Einstein  “There are only two ways to live your life. On...
3      Jane Austen  “The person, be it gentleman or lady, who has ...
4   Marilyn Monroe  “Imperfection is beauty, madness is genius and...

additionally i want to use bertopic model to detect topic from given site : topic modeling

but my code gives me following error :

ValueError: Transform unavailable when model was fit with only a single data sample.

could you help me please how to fix it? how to detect topic presented in sentences?

Solution

You should train using all quotes at once and not one-by-one. So instead of

for  index, row in data.iterrows():
    topics, probs =model.fit_transform([row['Quotes']])
    print(topics)

try

topics, probs = model.fit_transform(data['Quotes'].tolist())
data['Topic'] = topics
data['Probability'] = probs
print(data.head())