Web scraping from Amazon is generating an error

I've been trying to perform web scraping on an Amazon product to extract the price. I've only used Requests to try to fetch the page data, but I always get the same error:

"To discuss automated access to Amazon data please contact api-services-support@amazon.com. For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_c_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_c_ac for advertising use cases.

Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies."

What can I do?

import requests

header = { 
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US,en;q=0.9", 
    "Upgrade-Insecure-Requests": "1", 
    "Referer": "https://www.google.com/",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
}
url = 'https://www.amazon.com/-/es/Revlon-One-Step-Volumizer-PLUS/dp/B096SVJZSW/?_encoding=UTF8&content-id=amzn1.sym.3f4ca281-e55c-46d1-9425-fb252d20366f&ref_=pd_gw_exports_top_sellers_unrec'

response = requests.get(url, headers=header)

data=response.text
print(data)
print(response.status_code)

Solution

Amazon adds cookies when you browse them to assure you work though a browser (just go into browsers developer mode and look at application/Cookies)

If you use requests directly it will not return any cookies:

#!/usr/bin/env python3
import requests
s = requests.Session()
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36'}
url = "https://www.amazon.com/-/es/Revlon-One-Step-Volumizer-PLUS/dp/B096SVJZSW/?_encoding=UTF8&content-id=amzn1.sym.3f4ca281-e55c-46d1-9425-fb252d20366f&ref_=pd_gw_exports_top_sellers_unrec"
r = s.get(url, headers=headers)
print(s.cookies.get_dict())

{}

So amazon has something in place to prevent python requests to work, even if you manipulate User-Agent.

Options:

1 - Amazon has a price api - https://developer-docs.amazon.com/sp-api/docs/product-pricing-api-v0-reference#getpricing with 0.5 queries per sec max rate in free usage plan. But this is recommended compared to a slow browser.

2 - You can use browsers, it is rather slow. But working with playwright is so fun:

#!/usr/bin/env python3
import time
from playwright.sync_api import sync_playwright

url = "https://www.amazon.com/-/es/Revlon-One-Step-Volumizer-PLUS/dp/B096SVJZSW/?_encoding=UTF8&content-id=amzn1.sym.3f4ca281-e55c-46d1-9425-fb252d20366f&ref_=pd_gw_exports_top_sellers_unrec"

with sync_playwright() as p:
    t0 = time.time()
    browser = p.chromium.launch(headless=True) # just so you know how to get it headfull for debugging
    page = browser.new_page()
    page.goto(url)
    #print(page.title())
    price = page.locator("(//span[@class='a-price a-text-price']/span[@class='a-offscreen'])[1]").text_content()
    print(f"{price} in {time.time()-t0:.2f}sec")
    #page.pause() #do not close browser in full mode for debug
    browser.close()

US$39.97 in 10.26sec

https://playwright.dev/python/