xpathinfoset

XPath expression


This question regards XPath expressions.

I want to find the average of the length of all URLs in a Web page, that point to a .pdf file.

So far I have constructed the following expression, but it does not work:

sum(string-length(string(//a/@href[contains(., ".pdf")]))) div count(//a/@href[contains(., ".pdf")])

Any help will be appreciated!


Solution

  • You will need XPath 2.0.

    For calculating the sum of the string lengths, you will need either

    If using XPath 2.0, there are functions avg(...) and ends-with(...) which help you in stripping down the expression to

    avg(//a/@href[ends-with(., '.pdf')]/string-length())
    

    If you have to stick with XPath 1.0, all you can do is using my expression below to fetch the URLs and calculate the average outside XPath.


    Anyway, the subexpression you proposed will fail at URLs like http://example.net/myfile.pdf.txt. Only compare the end of the URL:

    //a[@href[substring(., string-length(.) - 3) = '.pdf']]/@href
    

    And you missed a path step for the attribute, so you've been trying to average the string length of the link names right now.