excelvbaweb-scrapingmshtml

How to pick specific table from MSHTML.IHTMLElementCollection


Im web scraping a website and i know i always want the 16th table in the IHTMLElementCollection. How do i pick out that specific one? In the creation process i skipped over this by just running a for loop and going through them all with a counter (skipper), but im at the optimization point and this is a big one for me.

    For Each HTMLTable In HTMLTables
    'Temp variable to set which table gets used - not sure how to
    'select the table i want
    skipper = skipper + 1
    Do While skipper = 16
        'Checks if worksheet exists with machine serial number
        'if not - creates it, if it is, sets as active
        For x = 1 To worksh
            If Worksheets(x).Name = MachineSerials Then
                Worksheets(MachineSerials).Activate
                worksheetexists = True
                Exit For
            End If
        Next x
        If worksheetexists = False Then
            Set ws = Sheets.Add(After:=Sheets(Sheets.Count))
            ws.Name = MachineSerials
            Range("A1") = "Last Updated : "
        End If
        'inserts time stamp for last updated at top of page

        Range("B1").Value = Now
        RowNum = 2

        'Dumps tables information into sheet
        For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
            ColNum = 1
            For Each HTMLCell In HTMLRow.Children
                'Checks if the new information is the same as whats already on the screen
                If StrComp(HTMLCell.innerText, Cells(RowNum, ColNum)) = 0 Or Format(HTMLCell.innerText, "yyyy/mm/dd") = Format(Cells(RowNum, ColNum), "yyyy/mm/dd") Then
                    ColNum = ColNum + 1
                Else
                    Cells(RowNum, ColNum) = HTMLCell.innerText
                    ColNum = ColNum + 1
                    change = 1
                End If
            Next HTMLCell
            RowNum = RowNum + 1
        Next HTMLRow
        Exit For
    Loop
Next HTMLTable

Solution

  • Try using item:

    The getElementsByTagName() method returns a collection of all elements in the document with the specified tag name, as a NodeList object.

    The NodeList object represents a collection of nodes. The nodes can be accessed by index numbers. The index starts at 0.

    In your case I'd start with:

    Set htmlTables = html.getElementsByTagName("table")
    Debug.Print htmlTables.Length
    Debug.Print htmlTables.Item(12).innerHTML
    

    This will validate if the code picks up the 13th table (Item(12)) properly. Then you can continue with the rest of your code, e.g.:

    For Each htmlRow In htmlTable.getElementsByTagName("tr")
        Debug.Print htmlRow.innerText
    Next htmlRow
    

    I hope this will guide you in the right direction.

    Edit

    In order to combine table and row index, you can use the following code (here's an example for w3schools webpage):

    Set htmlTables = html.getElementsByTagName("table")
    Set htmlTable = htmlTables.Item(1)
    
    Set htmlRow = htmlTable.getElementsByTagName("tr").Item(2)
    Debug.Print htmlRow.innerText
    

    Item(2) refers to align row of the 2nd table Item(1): enter image description here

    Immediate window properly prints out the following result:

    enter image description here