htmlexcelvbaparsingqueryselector

How to get the nth DIV of a tag in HTML using VBA and selectors for export to Excel


This ist the HTML of a public website:

 <div class="flex flex-col md:flex-row mb-[1.0rem]">
     <div class="align-text-top pr-1 md:pr-7 w-41">NUMBER:</div>
     <div class="hn-4 md:ml-1">TWO</div>
 </div>

I need to extract the innertext of class="hn-4 md:ml-1", in this case it would be TWO, only if the innertext of the DIV above it, so class="align-text-top pr-1 md:pr-7 w-41" is NUMBER.

How would I have to change my code (it doesn't work!) to get the innertext of the DIV child?

Function TestFunction()

Dim html As New HTMLDocument
Dim ele1 As Object, ele2 As Object
Dim VarGetContent as String
Dim x As Long

Url = "https:xxxxxxxxxxxxxxxxxx"
        
With CreateObject("MSXML2.XMLHTTP.6.0")
  .Open "GET", Url, False
  .send
   html.body.innerHTML = .responseText
End With
 
Set ele1 = html.querySelectorAll("[class*='align-text-top pr-1 md:pr-7 w-41']")
Set ele2 = html.querySelectorAll("[class*='hn-4 md:ml-1']")
               
For x = 0 To ele1.length - 1
  If ele1.Item(x).innerText = "NUMBER:" Then
  VarGetContent = ele2.Item(x).innerText
  MsgBox VarGetContent
  End If
  Next
  
End Function

Thanks!


Solution

  • In the loop i don't really see how you try to get the other div.

    Here's corrected version according to these:

    Function TestFunction()
        Dim html As New HTMLDocument
        Dim ele1 As Object, parentDiv As Object
        Dim sibling As Object
        Dim VarGetContent As String
        Dim Url As String
        
        Url = "https:xxxxxxxxxxxxxxxxxx"
                
        ' Load the webpage
        With CreateObject("MSXML2.XMLHTTP.6.0")
            .Open "GET", Url, False
            .send
            html.body.innerHTML = .responseText
        End With
         
        ' Find all elements matching the class
        Set ele1 = html.querySelectorAll("div.align-text-top.pr-1.md\:pr-7.w-41")
        
        ' Loop through matching elements
        For Each parentDiv In ele1
            If Trim(parentDiv.innerText) = "NUMBER:" Then
                ' Get the next sibling, but we must loop until we find an actual element
                Set sibling = parentDiv.nextSibling
                Do While Not sibling Is Nothing And sibling.nodeType <> 1 ' 1 = Element node
                    Set sibling = sibling.nextSibling
                Loop
                
                ' If a valid sibling is found, extract text
                If Not sibling Is Nothing Then
                    VarGetContent = Trim(sibling.innerText)
                    MsgBox VarGetContent
                End If
            End If
        Next
    End Function