Based on the folloiwng link : quotes
with help of following code(this site was based on javascript, so first i have disabled it)
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.common.keys import Keys
browser =webdriver.Chrome()
browser.get("https://quotes.toscrape.com/")
elem = browser.find_elements(By.CLASS_NAME, 'author') # Find the search box
quot_choosing =browser.find_elements(By.CLASS_NAME,'text')
autors=[]
quotes =[]
for author in elem:
autors.append(author.text)
for quote in quot_choosing:
quotes.append(quote.text)
print(autors)
print(quotes)
autor_saying =pd.DataFrame({"Author":autors,"Quotes":quotes})
autor_saying.to_csv("quotes.csv",index=False)
print(autor_saying.head())
browser.quit()
i haved author's and quote's information in csv file and then read it as it is given it bellow :
import pandas as pd
from bertopic import BERTopic
model =BERTopic()
summarization =[]
data =pd.read_csv("quotes.csv")
print(data.head())
for index, row in data.iterrows():
topics, probs =model.fit_transform([row['Quotes']])
print(topics)
here is result :
Author Quotes
0 Albert Einstein “The world as we have created it is a process ...
1 J.K. Rowling “It is our choices, Harry, that show what we t...
2 Albert Einstein “There are only two ways to live your life. On...
3 Jane Austen “The person, be it gentleman or lady, who has ...
4 Marilyn Monroe “Imperfection is beauty, madness is genius and...
additionally i want to use bertopic model to detect topic from given site : topic modeling
but my code gives me following error :
ValueError: Transform unavailable when model was fit with only a single data sample.
could you help me please how to fix it? how to detect topic presented in sentences?
You should train using all quotes at once and not one-by-one. So instead of
for index, row in data.iterrows():
topics, probs =model.fit_transform([row['Quotes']])
print(topics)
try
topics, probs = model.fit_transform(data['Quotes'].tolist())
data['Topic'] = topics
data['Probability'] = probs
print(data.head())