
XPath expressions for extracting information from AWIS ( XML data

I somehow can't manage to extract information from AWIS results (containing Alexa data).

I've a bunch of XML files containing AWIS data from which I want to extract information bits such as Rank and PageViews for 3 month period.

The two (colliding) namespaces are somehow confusing and my XPath expressions are not working as intended. (Even a simple //aws:Rank/text() is not working.)

It would be great if somebody could assist me to get going.

Currently, I am using jdom library, but wouldn't mind using something else. This is a tiny example that does not work as suspected:

Document doc = new SAXBuilder().build(file);
XPath xpath = XPath.newInstance("//aws:Rank");
xpath.addNamespace("aws", "");
Element rank = (Element) xpath.selectSingleNode(doc);

I'd prefer to use javax.xml though...

Here's an example of the XML:

<?xml version="1.0"?>
<aws:UrlInfoResponse xmlns:aws="">
<aws:Response xmlns:aws="">

    <aws:DataUrl type="canonical"></aws:DataUrl>
      <aws:PhoneNumber>+33 140289796</aws:PhoneNumber>
    <aws:OwnerName>John Fay</aws:OwnerName>
        <aws:Street>22 rue Saint Sauveur</aws:Street>
      <aws:City>Paris 75002,</aws:City>
    <aws:DataUrl type="canonical"></aws:DataUrl>
      <aws:Title>Ah Paris</aws:Title>
      <aws:Description>Short term apartment rentals. Search engine, descriptions, photos, rates.</aws:Description>
    <aws:DataUrl type="canonical"></aws:DataUrl>
<aws:ResponseStatus xmlns:aws="">


  • It looks like a typo in the namespace URI - your code has

    xpath.addNamespace("aws", "");

    (with a trailing slash) but the document has


    (without the slash).

    I'd prefer to use javax.xml though...

    Namespace handling is a real pain in javax.xml.xpath, because there's no default implementation of the NamespaceContext interface provided in the Java class library. You have to either implement your own or use a third-party implementation (I usually go for the SimpleNamespaceContext from Spring). If you're going to be doing a lot of XPath manipulation I'd suggest looking at Saxon 9 (the HE version is free of charge) and use its s9api, as this supports the far more powerful version 2.0 of the XPath language.