google-apps-scriptgoogle-sheetsweb-scrapinggoogle-sheets-formulagoogle-search

Grab first URL off a Google search result


I'm trying to use the following formula to retrieve the first URL off a Google search result, but I'm getting the error that Imported content is empty. :

=IMPORTXML("https://www.google.com/search?q=cnn&num=1", "//cite")

It looks like the result URL is contained within an cite tag in the results page, but the formula isn't working as expected. I expected this formula would return https://www.cnn.com. Same formula with //h3 as XPath does return the title.

Any idea what I am doing wrong?


Solution

  • Issue:

    The cite tag is generated dynamically via script, making it unreachable to formulas like IMPORTXML.

    You can see that's the case if you disable JavaScript and refresh the page. You won't see that element anymore.

    Workaround:

    The URL is also available in a div that is sibling of the h3 element containing the title. This element is not injected dynamically via script, so it can be accessed by IMPORTXML. Retrieve that instead:

    =IMPORTXML("https://www.google.com/search?q=cnn&num=1", "//h3/following-sibling::div")
    

    enter image description here

    Related: