pythonweb-scrapingtripadvisor

Python web-scraper not working for TripAdvisor


I am trying to write a simple Python scraper in order to save all the reviews of a specific place on TripAdvisor.

The specific link I am using as example is the following:

https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html

Here is the code I am using, that is supposed to print the relative html:

from bs4 import BeautifulSoup
import requests

url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"

r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
print(soup)

If I run this code in the console it stays pending on the requests.get(url) for long without any output. Using another url (for example url = "https://stackoverflow.com/") I get immediately the html correctly displayed. Why is TripAdvisor not working? How can I manage to obtain its html?


Solution

  • Adding an user-agent should solve your issue in first step, cause some sites provides different content or use it for bot / automation detection - Open DEVTools in your browser an copy the user-agent from one of your requests:

    headers = {'User-Agent': 'Mozilla/5.0'}
    r = requests.get(url,headers=headers)
    

    Example

    from bs4 import BeautifulSoup
    import requests
    
    url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"
    headers = {'User-Agent': 'Mozilla/5.0'}
    
    r = requests.get(url,headers=headers)
    data = r.text
    soup = BeautifulSoup(data)
    data = []
    
    for e in soup.select('#tab-data-qa-reviews-0 [data-automation="reviewCard"]'):
        data.append({
            'rating':e.select_one('svg[aria-label]')['aria-label'],
            'profilUrl':e.select_one('a[tabindex="0"]').get('href'),
            'content':e.select_one('div:has(>a[tabindex="0"]) + div + div').text
        })
    
    data
    

    Output

    [{'rating': '5.0 of 5 bubbles',
      'profilUrl': '/ShowUserReviews-g319796-d5988326-r620396152-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html',
      'content': "We were fortunate to get in without pre-booking.What a find. A UNESCO site in the middle of the countryside.The replication cave is so awesome and authentic, hard to believe it's not the real thing.The museum is beautifully curated, great for students, and anyone interested in archeology and the beginnings of human existence.Definitely worth visiting. We nearly missed out 😕Read more"},
     {'rating': '5.0 of 5 bubbles',
      'profilUrl': '/ShowUserReviews-g319796-d5988326-r618358203-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html',
      'content': 'Beautiful site with great replica’s of the original cave, excellent exposition, poor film as an introduction however!The most urgent issue: long waiting because you need a slot to enter. This could be done 1000% better and in every decent museum it is done better! Staff probably civil servants with no great desire to make you enjoy the visit. Building urgently needs a revamp, no exposure at all!Read more'},...]