pythonhtmlplaywright

Get data with Playwright from number of <div>


I'm trying to find an example of using Playwright to scaping data from the web. Data with numerous <div> tags is placed on the webpage. I need to extract it from the tag <span> to array for the tag <product-key-value>. The majority of the code was obtained from here.

This is one possible code, but it didn't work:

product_section = page.locator('//div[@class="product-key-value"]')
spans = product_section.locator('span')
span_elements = spans.element_handles()

data_array = [span.text_content() for span in span_elements]
print(data_array)

Similar solution, but useless for me:

This is a screenshot target page


Solution

  • Solution is founded! With 'locator' method, we get all data between the product-card-info tags and with 'for' collect data to array or print on screen. In my case locator look like '#main > div > mp-main > mp-product-page > div > div > div > product-page-default-layout > product-card > div > div >.....

    all_links = await page.locator("locator copied from html").all()
    for link in all_links:
        text = await link.inner_text()
        print(text)
    

    Using "selector" not "xPath"