pythonweb-scrapingplaywrightplaywright-python

Count of pages found in google search


I am looking for the count of pages where search term "indieea" is found. Visited this page:

https://www.google.com/search?q=%22indieea%22

Goto the last page in search results. You get this line...

we have omitted some entries very similar to the 64 already displayed.

The function should return 64 because there are 64 pages for the given search term "indieea".

The code that I tried:

import asyncio
import urllib.parse
from playwright.async_api import async_playwright  # 1.44.0

async def main():
    term = urllib.parse.quote_plus("टंकलेखन")
    url = f"https://www.google.com/search?q={term}"

    async with async_playwright() as pw:
        browser = await pw.chromium.launch()
        page = await browser.new_page()
        await page.goto(url, wait_until="domcontentloaded")

        # Find the <a> tag with aria-label="Page 9" and class="fl"
        link_element = page.locator('a[aria-label="Page 9"].fl').first
        if await link_element.count() > 0:  # Check if the element exists
            # Get the outerHTML of the element to print the full source code of the link
            link_html = await link_element.evaluate('el => el.outerHTML')
            print("Source code of the link:", link_html)
        else:
            print("Element not found.")

        await browser.close()

if __name__ == "__main__":
    asyncio.run(main())

Solution

  • Not sure if this is contributing anything but if you press the "tools" button on Google you can see how many results there are found. If you use your search term with "", it will look for precise matches. Im seeing 913 hits.

    In fact, this datapoint is in the returned html and you can search the returned page for:

    document.getElementById("result-stats").innerText
    

    or via your favorite python html parsing package.

    At the moment this returns:

    About 918 results (0.21 seconds)