I am integrating scrapy with playwright but find myself having difficulties with adding a timer after a click. Therefore, when I take a screenshot of the page after a click it's still hanging on the log-in page.
How can I integrate a timer so that the page waits a few seconds until the page loads?
The selector
.onetrust-close-btn-handler.onetrust-close-btn-ui.banner-close-button.onetrust-lg.ot-close-icon
below was replaced with.onetrust-close-btn-handler
import scrapy
from scrapy_playwright.page import PageCoroutine
class DoorSpider(scrapy.Spider):
name = 'door'
start_urls = ['https://nextdoor.co.uk/login/']
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback = self.parse,
meta= dict(
playwright = True,
playwright_include_page = True,
playwright_page_coroutines = [
PageCoroutine("click",
selector = ".onetrust-close-btn-handler"),
PageCoroutine("fill", "#id_email", 'my_email'),
PageCoroutine("fill", "#id_password",
'my_password'),
PageCoroutine('waitForNavigation'),
PageCoroutine("click", selector="#signin_button"),
PageCoroutine("screenshot", path="cookies.png",
full_page=True),
]
)
)
def parse(self, response):
yield {
'data':response.body
}
There are many waiting
methods that you can use depending on your particular use case. Below are a sample but you can read more from the docs
wait_for_event(event, **kwargs)
wait_for_selector(selector, **kwargs)
wait_for_load_state(**kwargs)
wait_for_url(url, **kwargs)
wait_for_timeout(timeout
For your question, if you need to wait until page loads, you can use below coroutine and insert it at the appropriate place in your list:
...
PageMethod("wait_for_load_state", "load"),
...
or
...
PageMethod("wait_for_load_state", "domcontentloaded"),
...
You can try any of the other wait
methods if the two above don't work or you can use an explicit timeout value like 3 seconds.(this is not recommended as it will fail more often and is not optimal when webscraping)
...
PageMethod("wait_for_timeout", 3000),
...
Pass these methods inside of meta
under playwright_page_methods
list, like this:
from scrapy_playwright.page import PageMethod
...
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, meta=dict(
...,
playwright_page_methods = [
PageMethod("wait_for_load_state", "domcontentloaded"),
...
]
))