Im web scraping a website and i know i always want the 16th table in the IHTMLElementCollection. How do i pick out that specific one? In the creation process i skipped over this by just running a for loop and going through them all with a counter (skipper), but im at the optimization point and this is a big one for me.
For Each HTMLTable In HTMLTables
'Temp variable to set which table gets used - not sure how to
'select the table i want
skipper = skipper + 1
Do While skipper = 16
'Checks if worksheet exists with machine serial number
'if not - creates it, if it is, sets as active
For x = 1 To worksh
If Worksheets(x).Name = MachineSerials Then
Worksheets(MachineSerials).Activate
worksheetexists = True
Exit For
End If
Next x
If worksheetexists = False Then
Set ws = Sheets.Add(After:=Sheets(Sheets.Count))
ws.Name = MachineSerials
Range("A1") = "Last Updated : "
End If
'inserts time stamp for last updated at top of page
Range("B1").Value = Now
RowNum = 2
'Dumps tables information into sheet
For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
ColNum = 1
For Each HTMLCell In HTMLRow.Children
'Checks if the new information is the same as whats already on the screen
If StrComp(HTMLCell.innerText, Cells(RowNum, ColNum)) = 0 Or Format(HTMLCell.innerText, "yyyy/mm/dd") = Format(Cells(RowNum, ColNum), "yyyy/mm/dd") Then
ColNum = ColNum + 1
Else
Cells(RowNum, ColNum) = HTMLCell.innerText
ColNum = ColNum + 1
change = 1
End If
Next HTMLCell
RowNum = RowNum + 1
Next HTMLRow
Exit For
Loop
Next HTMLTable
Try using item
:
The getElementsByTagName() method returns a collection of all elements in the document with the specified tag name, as a NodeList object.
The NodeList object represents a collection of nodes. The nodes can be accessed by index numbers. The index starts at 0.
In your case I'd start with:
Set htmlTables = html.getElementsByTagName("table")
Debug.Print htmlTables.Length
Debug.Print htmlTables.Item(12).innerHTML
This will validate if the code picks up the 13th table (Item(12)
) properly. Then you can continue with the rest of your code, e.g.:
For Each htmlRow In htmlTable.getElementsByTagName("tr")
Debug.Print htmlRow.innerText
Next htmlRow
I hope this will guide you in the right direction.
Edit
In order to combine table and row index, you can use the following code (here's an example for w3schools webpage):
Set htmlTables = html.getElementsByTagName("table")
Set htmlTable = htmlTables.Item(1)
Set htmlRow = htmlTable.getElementsByTagName("tr").Item(2)
Debug.Print htmlRow.innerText
Item(2)
refers to align
row of the 2nd table Item(1)
:
Immediate window properly prints out the following result: