pythonplaywrightplaywright-python

Getting response.status and response.url from the site visited


I am trying to get the page.url, response.url and response.status from the websites. This is what I am trying:

from playwright.sync_api import sync_playwright

def scrape_page(url):
  
  with sync_playwright() as p:
      browser = p.chromium.launch()
      page = browser.new_page()
      page.goto(url)
      outs = {"url": page.url, "res_url": response.url, "res_status": response.status}
      browser.close()
      
  return outs

Individually, "url": page.url works. But if I add "res_url": response.url, "res_status": response.status, it is giving me the following error:

>>> scrape_page("https://theguardian.co.uk")Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in scrape_page
NameError: name 'response' is not defined

Any idea how to fix the mistake?


Solution

  • When something isn't defined, define it:

    from playwright.sync_api import sync_playwright  # 1.44.0
    
    
    def scrape_page(url):
        with sync_playwright() as p:
            browser = p.chromium.launch()
            page = browser.new_page()
            response = page.goto(url)
            # ^^^^^^^^^
    
            outs = {"url": page.url, "res_url": response.url, "res_status": response.status}
            browser.close()
            return outs
    
    
    print(scrape_page("https://www.example.com"))