I am trying to get all of the url links of restaurants in Singapore but my code is not working
data = requests.get("https://www.tripadvisor.com.sg/Restaurants-g294265-Singapore.html").text
soup = BeautifulSoup(data, "html.parser")
for link in soup.find_all('a', {'property_title'}):
print('https://www.tripadvisor.com/Restaurant_Review-g294265-' + link.get('href'))
print(link.string)
It keeps on loading and loading again in the code soup = BeautifulSoup(data, "html.parser")
I don't know why this happens even though this works well for other sites.
Is this because trip advisor block crawling or code is wrong?
It keeps on loading and loading again
To get a response, add the user-agent
header
:
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
data = requests.get(
"https://www.tripadvisor.com.sg/Restaurants-g294265-Singapore.html", headers=headers
).text
But the data is loaded dynamically, and requests
doesn't support dynamically loaded pages. However, the is available in JSON format on the website, (It's not clear what you want to scrape). To get all the data you can use the json
/re
modules:
import json
...
data = requests.get(
"https://www.tripadvisor.com.sg/Restaurants-g294265-Singapore.html", headers=headers
).text
json_data = re.search(r"window\.__WEB_CONTEXT__=({.*});", data, flags=re.MULTILINE).group(1)
print(
# Prints all the data, you can use `json.loads` instead to access the data instead
json.dumps(json_data, indent=4)
)
To get all the links:
import re
import requests
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
data = requests.get(
"https://www.tripadvisor.com.sg/Restaurants-g294265-Singapore.html", headers=headers
).text
for link in re.findall(r'"detailPageUrl":"(.*?)"', data):
print("https://www.tripadvisor.com.sg/" + link)
Output (truncated):
https://www.tripadvisor.com.sg//Restaurant_Review-g294265-d1145149-Reviews-Grand_Shanghai_Restaurant-Singapore.html
https://www.tripadvisor.com.sg//Restaurant_Review-g294265-d1193730-Reviews-Entre_Nous_creperie-Singapore.html
https://www.tripadvisor.com.sg//Restaurant_Review-g294265-d1173583-Reviews-The_Courtyard-Singapore.html
https://www.tripadvisor.com.sg//Restaurant_Review-g294265-d4611806-Reviews-NOX_Dine_in_the_Dark-Singapore.html
https://www.tripadvisor.com.sg//Restaurant_Review-g294265-d13152787-Reviews-Positano_Risto-Singapore.html