I'm trying to extract some data from online htmls using ruby hpricot library. I use the firefox extension fire bug to get the xpath of a selected item.
There's always the extra tbody tag present in the produced xpath expression. In some cases, I must remove the tbody tag from the expression to obtain the results while in other cases, I must keep the tag to get the results.
I just can't figure out when to keep the tbody tag and when not to.
In order to take into account and avoid this problem, use XPath expressions of the following kind:
/locStep1/locStep2/.../table/YourSubExpression
|
/locStep1/locStep2/.../table/tbody/YourSubExpression
If the table
doesn't have a tbody
child, then the second argument of the union operator (|
) selects no nodes and the first argument of the union selects the wanted nodes.
Alternatively, if the table
has a tbody
child, then the first argument of the union operator selects no nodes and the second argument of the union selects the wanted nodes.
The end result: in both cases the wanted nodes are selected