pythonweb-scrapingplaywrightplaywright-python

Python playwright unable to access elements


I want to scrape the words which reside in the <li> elements. The results return an empty list. Are they resided within a frame because as I can see they are not within any <iframe><\iframe> elements? If they do how do you access the frame or find the frame id in this case? Here is the site and the code

from playwright.sync_api import sync_playwright, expect


def test_fetch_paperrater():
    path = r"https://www.paperrater.com/page/lists-of-adjectives"
    with sync_playwright() as playwright:
        browser = playwright.chromium.launch()
        page = browser.new_page()
        page.goto(path)
        texts = page.locator("div#header-container article.page ul li").all_inner_texts()
        print(texts)
        browser.close()

Solution

  • The elements were not in div#header-container but div#wrapper. There were multiple ul elements and the best way to access these was with nth() as follows

    with sync_playwright() as playwright:
        browser = playwright.chromium.launch()
        page = browser.new_page()
        page.goto(path)
        words = []
        for i in range(1, 22, 2):
            all_texts = page.locator("div#wrapper article.page ul").nth(i).all_inner_texts()
            texts = all_texts[0].split("\n")
            for text in texts:
                append = words.append(text)
        browser.close()