pythonjsonbeautifulsouprequestweb-crawler

Python crawling JSON - Getting all items back


I'm currently facing the problem that I'm not able to scrape the information that I want from a particular website.

In detail, I would like to get all the items and prices back of the sightseeings which are in the JSON.

So far I'm able to get all the prices back, but lacking to get all the items back as well. I'm just getting one particular item back.

Not sure what the problem is.

That is my logic so far:

session = requests.Session()
session.cookies.get_dict()
url = 'http://www.citydis.com'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)    AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = session.get(url, headers=headers)

soup = BeautifulSoup(response.content, "html.parser")
metaConfig = soup.find("meta",  property="configuration")


jsonUrl = "https://www.citydis.com/s/results.json?&q=London&   customerSearch=1&page=0"
js_dict = (json.loads(response.content.decode('utf-8')))


for item in js_dict:
   header = (js_dict['searchResults']["tours"])
   for titles in header:
       title_final = (titles.get("title"))



   url = (js_dict['searchResults']["tours"])
   for urls in url:
       url_final = (urls.get("url"))


   price = (js_dict['searchResults']["tours"])
   for prices in price:
       price_final = (prices.get("price")["original"])

       print("Header: " + title_final + " | " + "Price: " + price_final)

That's the output:

   Header: Ticket für Madame Tussauds London & Star-Wars-Erlebnis | Price: 83,66 €
 Header: Ticket für Madame Tussauds London & Star-Wars-Erlebnis | Price: 37,71 €
 Header: Ticket für Madame Tussauds London & Star-Wars-Erlebnis | Price: 152,01 €

As you guys can see, the prices are displayed correctly but the items(headers) do not differ. I'm just getting one particular item back.

Could you guys help me out? Any feedback is appreciated.


Solution

  • Your for loops are incorrect. You would have only 1 (the last one) title_final for each of your prices in price, hence the issue.

    You may want to do -

    for item in js_dict:
       headers = js_dict['searchResults']["tours"]
       prices = js_dict['searchResults']["tours"]
    
       for title, price in zip(headers, prices):
           title_final = titles.get("title")
           price_final = prices.get("price")["original"]
           print("Header: " + title_final + " | " + "Price: " + price_final)