pythonplaywrightplaywright-python

Can't find element by XPATH playwright


I am trying to get all search results (URLs) from https://docs.vgd.ru/search/?v=1. I am using the xpath //a[@class='resultsMain'] to find them. The xpath is valid.

webpage

My code:

import asyncio
import time
from playwright.async_api import async_playwright


class VgdParser:
    def __init__(self, headless: bool = False):
        self.headless = headless
        self.browser = None
        self.playwright = None
        self.page = None

    
    async def start_browser(self):
        """Start the browser and create a new page with stealth mode"""
        self.playwright = await async_playwright().start()
        self.browser = await self.playwright.chromium.launch(
            headless=self.headless,
        )
        self.page = await self.browser.new_page()
        await self.page.goto("https://docs.vgd.ru/search/?v=1")
    
    async def search_by_name(self, name: str):
        """Enter name in https://docs.vgd.ru/search/?v=1 and collect results"""
        # Wait for the iframe to appear
        # Get the iframe (case-insensitive id)
        frame = self.page.frame(name="iFrame1")
        
        if frame is None:
            raise Exception("iframe with id 'iFrame1' not found")
        # Wait for the input inside the iframe
        input_locator = frame.locator('//input[@placeholder="Введите запрос"]')
        await input_locator.fill(name)
        await input_locator.press('Enter')

        frame = self.page.frame(name="iFrame1")
        if frame is None:
            raise Exception("iframe with id 'iFrame1' not found")
        results_raw = frame.locator("//a[@class='resultsMain']")
        count = await results_raw.count()
        print("XXX_ ", count) # PRINTS 0
        for i in range(count):
            cur_result = results_raw.nth(i)
            text = await cur_result.inner_text()
            print("Result:", text)

async def main():
       parser = VgdParser()
       await parser.start_browser()
       await parser.search_by_name("Алексей Ермаков")

if __name__ == "__main__":
    asyncio.run(main())

The problem is that in the function search_by_name, line print("XXX_ ", count), it prints 0 - meaning, that it didn't find elements.


Solution

  • A few thoughts:

    Here's a minimal rewrite, on which you can build abstractions on your own, if you need to:

    from playwright.sync_api import sync_playwright  # 1.53.0
    
    with sync_playwright() as playwright:
        browser = playwright.chromium.launch()
        page = browser.new_page()
        page.goto("https://vgd.ru/search2", wait_until="commit")
        page.get_by_placeholder("Введите запрос").fill("Алексей Ермаков")
        page.keyboard.press("Enter")
        results = page.locator(".resultsMain")
        results.first.wait_for()
        print(results.all_text_contents())
    

    Output:

    ['Colesnik', 'Pacific', 'masterbos', 'yokainfromabyss', 'OlgaDoronina1983', ... ]
    

    Checking to see if you can intercept a network request or hit an API directly is probably worth exploring here. I haven't investigated this, just noting that the above is not necessarily an optimal strategy, per se, just a first step improvement over what you currently have.