htmlvbadomgetelementbyidgetelementsbyname

Loop through multiple divs using VBA


I am trying to extract information from a HTML page using Vb script. This is the HTML page from which I am trying to extract the information.

<div id="profile-education">

  <div class="position  first education vevent vcard" id="xxxxxx">
  University 1
  <span class="degree">Ph.D.</span>
  <span class="major">Computer Science</span>
  <p class="period">
  <abbr class="dtstart" title="2005-01-01">2005</abbr> &#8211; <abbr class="dtend" 
  title="2012-12-31">2012</abbr>
  </div>          

  <div class="position  education vevent vcard" id="xxxxxx">  
  University 2                  
  <span class="degree">M.Eng.</span> 
  <span class="major">Computer Science</span>
  <p class="period">
  <abbr class="dtstart" title="2000-01-01">2000</abbr> &#8211; <abbr class="dtend" 
  title="2004-12-31">2004</abbr>
  </p>
  </div>

</div>

I want to extract the information in the below format.

In my VB script, I have the following code which extracts the entire information as a single variable.

Dim openedpage as String
openedpage = iedoc1.getElementById("profile-education").innerText

However, if I use the following statement in my vb Script, I can get a particular span information.

openedpage = iedoc1.getElementById("profile-education").getElementsByTagName("span")
(0).innerText

The above code gives me Phd as the output. However, I will not know the total spans beforehand and so I cannot simply give span(0) and span(1) in my code. Also, I would like to extract the information for all div tags and I won't be knowing this information either. Basically, I want some loop structure to iterate through the div tags with the id profile-education from which I should be able to extract multiple div and span information.


Solution

  • Dim divs, div
    
    set divs = iedoc1.getElementById("profile-education").getElementsByTagName("div")
    
    for each div in divs
        debug.print "*************************************"
        debug.Print div.ChildNodes(0).toString
        debug.print div.getElementsByTagName("span")(0).innerText
        debug.print div.getElementsByTagName("span")(1).innerText
        '  etc...
    next div