I have a partial HTML document:
<h2>Destinations</h2>
<div>It is nice <b>anywhere</b> but here.
<ul>
<li>Florida</li>
<li>New York</li>
</ul>
<h2>Shopping List</h2>
<ul>
<li>Booze</li>
<li>Bacon</li>
</ul>
On every <li>
item, I want to know the category the item is in, e.g., the text in the <h2>
tags.
This code does not work, but this is what I'm trying to do:
@page.search('li').each do |li|
li.previous('h2').text
end
Nokogiri allows you to use xpath expressions to locate an element:
categories = []
doc.xpath("//li").each do |elem|
categories << elem.parent.xpath("preceding-sibling::h2").last.text
end
categories.uniq!
p categories
The first part looks for all "li" elements, then inside, we look for the parent (ul, ol), the for an element before (preceding-sibling) which is an h2. There can be more than one, so we take the last (ie, the one closest to the current position).
We need to call "uniq!" as we get the h2 for each 'li' (as the 'li' is the starting point).
Using your own HTML example, this code output:
["Destinations", "Shopping List"]