vbaexcelweb-scrapingexcel-web-query

Excel VBA web source code - how to extract multiple fields to one sheet


Good afternoon guys. In a follow up to a previous query which was very much solved by QHarr, I was wanting to run the solved query against multiple fields from the source code rather than just one.

The URL I am using is: https://finance.yahoo.com/quote/AAPL/?p=AAPL

and the VBA code which takes the 'Previous Close' price is:

Option Explicit

    Sub PreviousClose()
        Dim html As HTMLDocument, http As Object, ticker As Range
        Set html = New HTMLDocument
        Set http = CreateObject("WINHTTP.WinHTTPRequest.5.1")

    Dim lastRow As Long, myrng As Range
    With ThisWorkbook.Worksheets("Tickers")

        lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
        Set myrng = .Range("A2:A" & lastRow)

        For Each ticker In myrng
            If Not IsEmpty(ticker) Then
                With http
                    .Open "GET", "https://finance.yahoo.com/quote/" & ticker.Value & "?p=" & ticker.Value, False
                    .send
                    html.body.innerHTML = .responseText
                End With
                On Error Resume Next
                ticker.Offset(, 1) = html.querySelector("[data-test=PREV_CLOSE-value]").innertext

                On Error GoTo 0
            End If
        Next

    End With
End Sub

Anyway, each field would ideally be in a row right of the ticker used for the stock.

Screenshot of Sheet:

image

Any help would be very much appreciated.
Thanks.


Solution

  • tl;dr;

    The code below works for the given test cases. With much longer lists please see the ToDo section.

    API:

    You want to look into an API to provide this info if possible. I believe Alpha Vantage now provide info the Yahoo Finance API used to* . There is a nice JS tutorial here. Alpha Vantage documentation here. At the very bottom of this answer, I have a quick look at the time series functions available via the API.

    WEBSERVICE function:

    With an API key, you can also potentially use the webservice function in Excel to retrieve and parse data. Example here. Not tested.

    XMLHTTPRequest and class:

    However, I will show you a way using a class and a loop over URLs. You can improve on this. I use a bare bones class called clsHTTP to hold the XMLHTTP request object. I give it 2 methods. One, GetHTMLDoc, to return the request response in an html document, and the other, GetInfo, to return an array of the items of interest from the page.

    Using a class in this way means we save on the overhead of repeatedly creating and destroying the xmlhttp object and provides a nice descriptive set of exposed methods to handle the required tasks.

    It is assumed your data is as shown, with header row being row 2.

    ToDo:

    The immediately obvious development, IMO, is you will want to add some error handling in. For example, you might want to develop the class to handle server errors.


    VBA:

    So, in your project you add a class module called clsHTTP and put the following:

    clsHTTP

    Option Explicit
    
    Private http As Object
    Private Sub Class_Initialize()
        Set http = CreateObject("MSXML2.XMLHTTP")
    End Sub
    
    Public Function GetHTMLDoc(ByVal URL As String) As HTMLDocument
        Dim html As HTMLDocument
        Set html = New HTMLDocument
        With http
            .Open "GET", URL, False
            .send
            html.body.innerHTML = StrConv(.responseBody, vbUnicode)
            Set GetHTMLDoc = html
        End With
    End Function
    Public Function GetInfo(ByVal html As HTMLDocument, ByVal endPoint As Long) As Variant
        Dim nodeList As Object, i As Long, result(), counter As Long
        Set nodeList = html.querySelectorAll("tbody td")
        ReDim result(0 To endPoint - 1)
        For i = 1 To 2 * endPoint Step 2
            result(counter) = nodeList.item(i).innerText
            counter = counter + 1
        Next    
        GetInfo = result
    End Function
    

    In a standard module (module 1)

    Option Explicit
    Public Sub GetYahooInfo()
        Dim tickers(), ticker As Long, lastRow As Long, headers()
        Dim wsSource As Worksheet, http As clsHTTP, html As HTMLDocument
    
        Application.ScreenUpdating = False
    
        Set wsSource = ThisWorkbook.Worksheets("Sheet1") '<== Change as appropriate to sheet containing the tickers
        Set http = New clsHTTP
    
        headers = Array("Ticker", "Previous Close", "Open", "Bid", "Ask", "Day's Range", "52 Week Range", "Volume", "Avg. Volume", "Market Cap", "Beta", "PE Ratio (TTM)", "EPS (TTM)", _
                        "Earnings Date", "Forward Dividend & Yield", "Ex-Dividend Date", "1y Target Est")
    
        With wsSource
            lastRow = GetLastRow(wsSource, 1)
            Select Case lastRow
            Case Is < 3
                Exit Sub
            Case 3
                ReDim tickers(1, 1): tickers(1, 1) = .Range("A3").Value
            Case Is > 3
                tickers = .Range("A3:A" & lastRow).Value
            End Select
    
            ReDim results(0 To UBound(tickers, 1) - 1)
            Dim i As Long, endPoint As Long
            endPoint = UBound(headers)
    
            For ticker = LBound(tickers, 1) To UBound(tickers, 1)
                If Not IsEmpty(tickers(ticker, 1)) Then
                    Set html = http.GetHTMLDoc("https://finance.yahoo.com/quote/" & tickers(ticker, 1) & "/?p=" & tickers(ticker, 1))
                    results(ticker - 1) = http.GetInfo(html, endPoint)
                    Set html = Nothing
                Else
                    results(ticker) = vbNullString
                End If
            Next
    
            .Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
            For i = LBound(results) To UBound(results)
                .Cells(3 + i, 2).Resize(1, endPoint-1) = results(i)
            Next
        End With   
        Application.ScreenUpdating = True
    End Sub
    
    Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
        With ws
            GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
        End With
    End Function
    

    Results:

    results


    Notes on GetInfo method and CSS selectors:

    The class method of GetInfo extracts the info from each webpage using a css combination selector to target the page styling.

    The info we are after on each page is house in two adjacent tables, for example:

    Rather than mess around with multiple tables I simply target all the table cells, within table body elements, with a selector combination of tbody td.

    The CSS selector combination is applied via the querySelectorAll method of HTMLDocument, returning a static nodeList.

    The returned nodeList items have headers at even indices and the required data at odd indices. I only want the first two tables of info so I terminate the loop over the returned nodeList when I gave gone twice the length of the headers of interest. I use a step 2 loop from index 1 to retrieve only the data of interest, minus the headers.

    A sample of what the returned nodeList looks like:


    References (VBE > Tools > References):

    1. Microsoft HTML Object Library

    Alpha Vantage API:

    A quick look at the time series API call shows that a string can be used

    https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AA&outputsize=full&apikey=yourAPIKey
    

    This yields a JSON response that in the Time Series (Daily) sub dictionary of the overall returned dictionary, has 199 dates returned. Each date has the following info:

    A little digging through the documentation will unveil whether bundling of tickers is possible, I couldn't see this quickly, and whether more of your initial items of interest are available via a different query string.

    There is more info, for example, using the TIME_SERIES_DAILY_ADJUSTED function in the URL call

    https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=AA&outputsize=full&apikey=yourAPIkey
    

    Here, you then get the following:

    You can parse the JSON response using a JSON parser such as JSONConverter.bas and there are also options for csv download.

    * Worth doing some research on which APIs provide the most coverage of your items. Alpha Vantage doesn't appear to cover as many as my code above retrieves.