asp.netc#-4.0html-agility-pack

Multi level scraping HTMLAgilityPack


I'm trying to scrape some data that comes in the following format:

<div class="ac_acdetail">
<div id="ac_makemodel">
<a href='/aircraft-for-sale/turbine/jets/Bombardier-Challenger/300-34856/' title='Bombardier Challenger 300' class=''>Bombardier Challenger 300</a>
</div>
<div id="ac_price">FOR SALE</div>
<div class="ac_keydetail">
<div class="title">PRICE:</div>
<div class="item">15,950,000 <font size=-2>USD</font></div>
<div class="clear"></div>
</div>
<div class="ac_keydetail">
<div class="title">YEAR:</div>
<div class="item">2009</div>
<div class="clear"></div>
</div>
<div class="ac_keydetail">
<div class="title">S/N:</div>
<div class="item">20266</div>
<div class="clear"></div>
</div>
<div class="ac_keydetail">
<div class="title">TTAF:</div>
<div class="item">1150</div>
<div class="clear"></div>
</div>
<div class="ac_keydetail">
<div class="title">LOCATION:</div>
<div class="item">USA</div>
<div class="clear"></div>
</div>
</div>

I need to get the text held within each div with the class of 'item'. What makes this a problem -for me at least- is getting each 'item' in its know order so that I can then store them against their corresponding column in the database.

Is it possible to say grab all the 'item' classes and then call them individually based upon their sequence? If so how?

Or is it necessary to loop through the results and pick them out with each iteration?

Furthermore, inside the 'item' div related to price, is it possible to extract the 'USD' text as another variable?


Solution

  • You can use indexes to select nth node. This xpath selects third node within fourth node :

    //div[4]//div[3]
    

    Or you can select with spesific text:

    //div//div[text()='USA']