I have an Excel workbook of around 25,000 company keywords from which I'd like to get the company website URL.
I am looking to run a VBA script which can run these keywords as a Google search, and pull the URL of the first result into a spreadsheet.
I found a similar thread.
The results of this to be hit-and-miss; some keywords return the URL in the next column, others remain blank.
It also seemed to pull the URL of Google's optimised sub-links in the first search result rather than the main website URL: Google Search Result example
I then found the below code here which I ran on a sample list of 1,000 keywords. The author of this blog stipulates that this code works for Mozilla Firefox.
I tested IE code that he has also written but this did not achieve the same results (it was adding hyperlinks consisting of descriptive text from the search results rather than the raw URL).
The Firefox code worked until the 714th row, then returned a error message
"Run time error 91: object variable or with block variable not set"
Spreadsheet layout showing successful results and row at which macro stopped
Sub GoogleURL ()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object
Dim html As Object
Dim objResultDiv As Object
Dim objH As Object
lastRow = Range(“A” & Rows.Count).End(xlUp).Row
For i = 2 To lastRow
url = “https://www.google.co.uk/search?q=” & Cells(i, 1) & “&rnd=” & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject(“MSXML2.serverXMLHTTP”)
XMLHTTP.Open “GET”, url, False
XMLHTTP.setRequestHeader “Content-Type”, “text/xml”
XMLHTTP.setRequestHeader “User-Agent”, “Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0”
XMLHTTP.send
Set html = CreateObject(“htmlfile”)
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid(“rso”)
Set objH = objResultDiv.getelementsbytagname(“h3”)(0)
Cells(i, 2).Value = objH.innerText
Set html = CreateObject(“htmlfile”)
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid(“rso”)
Set objH = objResultDiv.getelementsbytagname(“cite”)(0)
Cells(i, 3).Value = objH.innerText
DoEvents
Next
End Sub
As Firefox is a third party browser for the support scope of Microsoft, I can help you to check the VBA code for the IE browser.
You said that the VBA code given in this link for the IE browser generates the description with the link and your requirement is to store description and link in a separate column.
I tried to modify that sample code as per your requirement.
Here is the modified code from that sample.
Option Explicit
Const TargetItemsQty = 1 ' results for each keyword
Sub GWebSearchIECtl()
Dim objSheet As Worksheet
Dim objIE As Object
Dim x As Long
Dim y As Long
Dim strSearch As String
Dim lngFound As Long
Dim st As String
Dim colGItems As Object
Dim varGItem As Variant
Dim strHLink As String
Dim strDescr As String
Dim strNextURL As String
Set objSheet = Sheets("Sheet1")
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True ' for debug or captcha request cases
y = 1 ' start searching for the keyword in the first row
With objSheet
.Select
.Range(.Columns("B:B"), .Columns("B:B").End(xlToRight)).Delete ' clear previous results
.Range(.Columns("C:C"), .Columns("C:C").End(xlToRight)).Delete ' clear previous results
.Range("A1").Select
Do Until .Cells(y, 1) = ""
x = 2 ' start writing results from column B
.Cells(y, 1).Select
strSearch = .Cells(y, 1) ' current keyword
With objIE
lngFound = 0
.navigate "https://www.google.com/search?q=" & EncodeUriComponent(strSearch) ' go to first search results page
Do
Do While .Busy Or Not .READYSTATE = 4: DoEvents: Loop ' wait IE
Do Until .document.READYSTATE = "complete": DoEvents: Loop ' wait document
Do While TypeName(.document.getelementbyid("res")) = "Null": DoEvents: Loop ' wait [#res] element
Set colGItems = .document.getelementbyid("res").getElementsByClassName("g") ' collection of search result [.g] items
For Each varGItem In colGItems ' process each item in collection
If varGItem.getelementsbytagname("a").Length > 0 And varGItem.getElementsByClassName("st").Length > 0 Then ' must have hyperlink and description
strHLink = varGItem.getelementsbytagname("a")(0).href ' get first hyperlink [a] found in current item
strDescr = GetInnerText(varGItem.getElementsByClassName("st")(0).innerHTML) ' get first description [span.st] found in current item
lngFound = lngFound + 1
'Debug.Print (strHLink)
'Debug.Print (strDescr)
With objSheet ' put result into cell
.Cells(y, x).Value = strDescr
.Hyperlinks.Add .Cells(y, x + 1), strHLink
.Cells(y, x).WrapText = True
x = x + 1 ' next column
End With
If lngFound = TargetItemsQty Then Exit Do ' continue with next keyword - necessary quantity of the results for current keyword found
End If
DoEvents
Next
If TypeName(.document.getelementbyid("pnnext")) = "Null" Then Exit Do ' continue with next keyword - no [a#pnnext.pn] next page button exists
strNextURL = .document.getelementbyid("pnnext").href ' get next page url
.navigate strNextURL ' go to next search results page
Loop
End With
y = y + 1 ' next row
Loop
End With
objIE.Quit
' google web search page contains the elements:
' [div#res] - main search results block
' [div.g] - each result item block within [div#res]
' [a] - hyperlink ancor(s) within each [div.g]
' [span.st] - description(s) within each [div.g]
' [a#pnnext.pn] - hyperlink ancor to the next search results page
End Sub
Function EncodeUriComponent(strText As String) As String
Static objHtmlfile As Object
If objHtmlfile Is Nothing Then
Set objHtmlfile = CreateObject("htmlfile")
objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript"
End If
EncodeUriComponent = objHtmlfile.parentWindow.encode(strText)
End Function
Function GetInnerText(strText As String) As String
Static objHtmlfile As Object
If objHtmlfile Is Nothing Then
Set objHtmlfile = CreateObject("htmlfile")
objHtmlfile.Open
objHtmlfile.Write "<body></body>"
End If
objHtmlfile.body.innerHTML = strText
GetInnerText = objHtmlfile.body.innerText
End Function
Output in IE 11 browser:
You can try to run it on your side to see the results with large amount of data.
If you meet with any performance issue then I suggest you try it with a smaller amount of data.