cssxpathhpricot

hpricot: find elements of type A that have no ancestor of type B or C


I'm using hpricot to process some externally generated HTML.

What is the simplest way to find elements of one type (in my case: img) that do not have an ancestor of other types (in my case: p or div)?

I think the XPath expression //img[not ancestors::div and not ancestors::p] should do what I'm looking for. Unfortunately hpricot apparently does not support the ancestor axis. And as far as I know there no "no such ancestor" operator in CSS that I could use.


Solution

  • I solved my problem using set operations. I fetched all A nodes and subtracted those with B or C ancestor. These sets are easy to express and my problem is small enough that I don't get performance or resource problems with it.

    (doc.search("img") - doc.search("p img") - doc.search("div img")).each do |node|
        # process node
    end