I’m using Rails 4.2.7 with Ruby (2.3) and Nokogiri. How do I find the most direct tr children of a table, as opposed to nested ones? Currently I find table rows within a table like so …
tables = doc.css('table')
tables.each do |table|
rows = table.css('tr')
This not only finds direct rows of a table, e.g.
<table>
<tbody>
<tr>…</tr>
but it also finds rows within rows, e.g.
<table>
<tbody>
<tr>
<td>
<table>
<tr>This is found</tr>
</table>
</td>
</tr>
How do I refine my search to only find the direct tr elements?
You can do it in a couple of steps using XPath. First you need to find the “level” of the table
(i.e. how nested it is in other tables), then find all descendant tr
that have the same number of table
ancestors:
tables = doc.xpath('//table')
tables.each do |table|
level = table.xpath('count(ancestor-or-self::table)')
rows = table.xpath(".//tr[count(ancestor::table) = #{level}]")
# do what you want with rows...
end
In the more general case, where you might have tr
nested directly other tr
s, you could do something like this (this would be invalid HTML, but you might have XML or some other tags):
tables.each do |table|
# Find the first descendant tr, and determine its level. This
# will be a "top-level" tr for this table. "level" here means how
# many tr elements (including itself) are between it and the
# document root.
level = table.xpath("count(descendant::tr[1]/ancestor-or-self::tr)")
# Now find all descendant trs that have that same level. Since
# the table itself is at a fixed level, this means all these nodes
# will be "top-level" rows for this table.
rows = table.xpath(".//tr[count(ancestor-or-self::tr) = #{level}]")
# handle rows...
end
The first step could be broken into two separate queries, which may be clearer:
first_tr = table.at_xpath(".//tr")
level = first_tr.xpath("count(ancestor-or-self::tr)")
(This will fail if there is a table with no tr
s though, as first_tr
will be nil
. The combined XPath above handles that situation correctly.)