html.netweb-scrapinghtml-agility-packhtml-content-extraction

Http Agility Pack - Accessing Siblings?


Using the HTML Agility Pack is great for getting descendants and whole tables etc... but how can you use it in the below situation

...Html Code above...

<dl>
<dt>Location:</dt>
<dd>City, London</dd>
<dt style="padding-bottom:10px;">Distance:</dt>
<dd style="padding-bottom:10px;">0 miles</dd>
<dt>Date Issued:</dt>
<dd>26/10/2010</dd>
<dt>type:</dt>
<dd>cement</dd>
</dl>

...HTML Code below....

How could you find If miles was less than 15 in this case, I undestand you could do something with elements but would you have to get all elements find the correct one and then find the number just to check its value? Or is there are way to use regex with Agility pack to achieve this in a better way...


Solution

  • I'm pretty sure (haven't checked) that it supports the following-sibling:: axis, so you could either find the node "dt[.='Distance:']" and then find node.SelectSingleNode("following-sibling::dd[1]") - or (simpler) just use node.NextSibling if you are sure that the dd always immediately follows the dt.

    For example:

    string distance = doc.DocumentNode.SelectSingleNode(
              "//dt[.='Distance:']/following-sibling::dd").InnerText;