I'm scraping product reviews at this website" https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1
I managed to get the reviews, but only the first page of them:
import pandas as pd
from urllib.request import Request, urlopen as uReq #package web scraping
from bs4 import BeautifulSoup as soup
def make_soup(website) :
req = Request(website,headers = {'User-Agent' : 'Mozilla/5.0'})
uClient = uReq(req)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, 'html.parser')
return page_soup
lazada_url = 'https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1'
website = make_soup(lazada_url)
news_headlines = pd.DataFrame( columns = ['reviews','sentiment','score'])
headlines = website.findAll('div',attrs={"class":"item-content"})
n = 0
for item in headlines :
top = item.div
#print(top)
#print()
text_headlines = top.text
print(text_headlines)
print()
n +=1
news_headlines.loc[n-1,'title'] = text_headlines
The result only shows the first page. How can I do this for all pages? There are no pages in the URL for me to loop through.
Here is the example output that I need to get all pages of results for:
I like this phone very much and it's global version. I recommend this phone for who like gaming. Delivery just took 3 days only. Thanks Lazada
Item was received in just two days and was wonderfully wrapped. Thanks for the excellent services Lazada!
Very happy with the phone. It's original, it arrived in good condition. Built quality is superb for a budget phone.
The delivery is very fast just take one day to reach at my home. However, the tax invoice is not attached. How do I get the tax invoice?
great deal from lazada. anyway, i do not find any tax invoice. please do email me the tax invoice. thank you.
You can scrape the pagination at the bottom of the reviews to find the minimum and maximum number of reviews:
import requests
from bs4 import BeautifulSoup as soup
def get_page_reviews(content:soup) -> dict:
rs = content.find('div', {'class':'mod-reviews'}).find_all('div', {'class':'item'})
reviews = [i.find('div', {'class':'item-content'}).find('div', {'class':'content'}).text for i in rs]
stars = [len(c.find('div', {'class':'top'}).find_all('img')) for c in rs]
_by = [i.find('div', {'class':'middle'}).find('span').text for i in rs]
return {'stars':stars, 'reviews':reviews, 'authors':_by}
d = soup(requests.get('https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1').text, 'html.parser')
results = list(map(int, filter(None, [i.text for i in d.find_all('button', {'class':'next-pagination-item'})])))
for i in range(min(results), max(results)+1):
new_url = f'https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search={i}'
#now, can use new_url to request the next page of reviews
r = get_page_reviews(soup(requests.get(new_url).text, 'html.parser'))
final_result = [{'stars':a, 'author':b, 'review':c} for a, b, c in zip(r['stars'], r['authors'], r['reviews'])]
Output (for first page):
[{'stars': 5, 'author': 'by Ridwan R.', 'review': "I like this phone very much and it's global version. I recommend this phone for who like gaming. Delivery just took 3 days only. Thanks Lazada"}, {'stars': 5, 'author': 'by Razli A.', 'review': 'Item was received in just two days and was wonderfully wrapped. Thanks for the excellent services Lazada!'}, {'stars': 5, 'author': 'by Nur F.', 'review': "Very happy with the phone. It's original, it arrived in good condition. Built quality is superb for a budget phone."}, {'stars': 5, 'author': 'by Muhammad S.', 'review': 'The delivery is very fast just take one day to reach at my home. However, the tax invoice is not attached. How do I get the tax invoice?'}, {'stars': 5, 'author': 'by Xavier Y.', 'review': 'great deal from lazada. anyway, i do not find any tax invoice. please do email me the tax invoice. thank you.'}]