pythonweb-scrapingbeautifulsouppython-requestsdata-analysis

Python BeautifulSoup web-scraping Tripadvisor view a review


So I am new to web scraping and trying to view list of reviews for a particular hotel. I am initially trying to view for a particular review by selecting a particular class, and I am not getting any output, even when I try to check the status code of the request, I don't get any output. I believe my code is taking really long to run.

Does web scraping take time to run or there is a problem with my code?

import requests
from bs4 import BeautifulSoup

headers = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '3600',
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
    }

url = "https://www.tripadvisor.ca/Hotel_Review-g154913-d1587398-Reviews-Le_Germain_Hotel_Calgary-Calgary_Alberta.html"
req = requests.get(url, headers)

print (req.status_code)
soup = BeautifulSoup(req.content, 'html.parser')

review = soup.find_all(class_="XllAv H4 _a").get_text()
print(review)

Solution

  • changed few headers keys and some requests parameters i got error on .get_text() so replaced with other

    import requests
    from bs4 import BeautifulSoup
    
    headers = {
            'Access-Control-Allow-Origin': '*',
            'Access-Control-Allow-Methods': 'GET',
            'Access-Control-Allow-Headers': 'Content-Type',
            'accept': '*/*',
            'accept-encoding': 'gzip, deflate',
            'accept-language': 'en,mr;q=0.9',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'}
    
    url = "https://www.tripadvisor.ca/Hotel_Review-g154913-d1587398-Reviews-Le_Germain_Hotel_Calgary-Calgary_Alberta.html"
    req = requests.get(url,headers=headers,timeout=5,verify=False)
    print (req.status_code)
    soup = BeautifulSoup(req.content, 'html.parser')
    
    #review = soup.find_all(class_="XllAv H4 _a").get_text()
    #print(review)
    for x in soup.body.find_all(class_="XllAv H4 _a"):
        print(x.text)