clojureenlive

Use Enlive to match a specific TD tag in a group of TD tags


I am just getting started using Elive for an HTML screen scraping task. If I wanted the text from the second and fourth TD nodes of the following table, how would I specify the selector? I read through the tutorial but didn't find any examples of how to specify what in XPath would be:

html/body/table/tr/td[2] and /td[4] (assuming a one-based index)

<html>
<body>
<table width="100%"  border="0" cellspacing="3" cellpadding="2">
  <tr>
    <td width="15%" class="labels">Part No</td>
    <td class="datafield">I2013-00007</td>
    <td class="labels"><div align="right">Parcel No</div></td>
    <td colspan="3" class="datafield">07-220-12-03-01-2-00-000</td>
  </tr>
</table>
</body>
</html>

I need to capture the text value from those two TD nodes.


Solution

  • You can use nth-of-type like this:

    user> (require '[net.cgrand.enlive-html :as html])
    nil
    user> (def test-html 
    "<html><body><table width='100%'  border='0' cellspacing='3' cellpadding='2'><tr><td width='15%' class='labels'>Part No</td><td class='datafield'>I2013-00007</td><td class='labels'><div align='right'>Parcel No</div></td><td colspan='3' class='datafield'>07-220-12-03-01-2-00-000</td></tr></table></body></html>")
    #'user/test-html
    user> (:content (first (html/select (html/html-resource (java.io.StringReader. test-html)) [[:td (html/nth-of-type 2)]])))
    ("I2013-00007")