rubyxpathfirebughpricot

Tbody tag in xpath produced by fire bug


I'm trying to extract some data from online htmls using ruby hpricot library. I use the firefox extension fire bug to get the xpath of a selected item.

There's always the extra tbody tag present in the produced xpath expression. In some cases, I must remove the tbody tag from the expression to obtain the results while in other cases, I must keep the tag to get the results.

I just can't figure out when to keep the tbody tag and when not to.


Solution

  • In order to take into account and avoid this problem, use XPath expressions of the following kind:

     /locStep1/locStep2/.../table/YourSubExpression
    |
     /locStep1/locStep2/.../table/tbody/YourSubExpression
    

    If the table doesn't have a tbody child, then the second argument of the union operator (|) selects no nodes and the first argument of the union selects the wanted nodes.

    Alternatively, if the table has a tbody child, then the first argument of the union operator selects no nodes and the second argument of the union selects the wanted nodes.

    The end result: in both cases the wanted nodes are selected