I have a working scrapper, but I have trouble closing the pop up. And the pop up only comes in certain cases, so I need to handle it popup
I have tried finding a button attribute and click "Accept All"
the bold portion in the code is what I have tried
import asyncio
from pyppeteer import launch
import time
from datetime import datetime, timedelta
import pandas as pd
async def filter_by_url(url):
browser = await launch(
{
"headless": False,
'args':['--start-maximized'],
# 'executablePath':'/usr/bin/google-chrome'
}
)
# url = "https://www.justwatch.com/us/provider/netflix?sort_by=trending_7_day"
page = await browser.newPage()
await page.setViewport({'width': 1920, 'height': 1080})
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3')
await page.goto(url)
## Scroll To Bottom
**#time.sleep(5)
#await page.waitFor('footer span[data-icon="Accept all"]')
#await page.click('button:has-text("Accept all")');
#await Page.locator('uc-accept-all-button').first().click();**
while True:
await page.evaluate("""{window.scrollBy(0, document.body.scrollHeight);}""")
time.sleep(2)
end_point = await page.querySelector(".timeline__end-of-timeline")
if end_point:
print("reached to end points")
break
# Run the function
urls = [
'https://www.justwatch.com/ca/provider/netflix?sort_by=trending_7_day'
]
for url in urls:
asyncio.get_event_loop().run_until_complete(filter_by_url(url))
Your button is placed inside shadow-root
, to get internal shadow root structure, you should get it's host first and then get shadowRoot
property.
Shadow host has selector #usercentrics-root
.
You should wait for host content to be loaded and then click internal button. If content has not been rendered yet - repeat with timeout.
After that good practice to wait for host to be hidden.
await page.evaluate("""function acceptConsent() {
let accept = document.querySelector('#usercentrics-root').shadowRoot.querySelector('[data-testid=uc-accept-all-button]');
if(accept) {
accept.click();
return;
}
setTimeout(acceptConsent, 500);
}
""")
await page.waitForSelector('#usercentrics-root', options={'visible': False})