pythonbeautifulsouptrustpilot

I am getting the following JSON error when trying to web scrape dates from Trustpilot with BS4 - Python


I am using the following script to scrape user review data from the Trustpilot website to do some analysis on user sentiment using data from https://ca.trustpilot.com/review/www.hellofresh.ca I expect to scrape

Date, Star Rating,Review Content.

but when i run the code, i am getting the following error, can anyone help explain why?

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

stars = []
dates = []
comments = []
results = []
with requests.Session() as s:
    for num in range(1,2):
        url = "https://ca.trustpilot.com/review/www.hellofresh.ca?page={}".format(num)
        r = s.get(url, headers = headers)
        soup = BeautifulSoup(r.content, 'lxml')

        for star in soup.find_all("section", {"class":"review__content"}):

            # Get rating value
            rating = star.find("div", {"class":"star-rating star-rating--medium"}).find('img').get('alt')

            # Get date value
            #date_json = json.loads(star.find('script').text)
            #date = date_json['publishedDate']
            
            date_tag = star.select("div.review-content-header__dates > script")    
            date = json.loads(date_tag[0].text)
            dt = datetime.strptime(date['publishedDate'], "%Y-%m-%dT%H:%M:%SZ")
            
            
            # Get comment
            comment = star.find("div", class_="review-content__body").text

            stars.append(rating)
            dates.append(dt)
            comments.append(comment)

            data = {"Rating": rating, "Review": comment, "Dates": date}
            results.append(data)

        time.sleep(2)


print(results)```



Solution

  • To get the JSON data, you can call the .string method.

    ...
    
    date = json.loads(date_tag[0].string)
    >>> print(date)
    {'publishedDate': '2021-01-04T21:57:34+00:00', 'updatedDate': None, 'reportedDate': None}
    
    ...
    ...