I am trying to get the page.url
, response.url
and response.status
from the websites. This is what I am trying:
from playwright.sync_api import sync_playwright
def scrape_page(url):
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto(url)
outs = {"url": page.url, "res_url": response.url, "res_status": response.status}
browser.close()
return outs
Individually, "url": page.url
works. But if I add "res_url": response.url, "res_status": response.status
, it is giving me the following error:
>>> scrape_page("https://theguardian.co.uk")Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in scrape_page
NameError: name 'response' is not defined
Any idea how to fix the mistake?
When something isn't defined, define it:
from playwright.sync_api import sync_playwright # 1.44.0
def scrape_page(url):
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
response = page.goto(url)
# ^^^^^^^^^
outs = {"url": page.url, "res_url": response.url, "res_status": response.status}
browser.close()
return outs
print(scrape_page("https://www.example.com"))