xmlxpathphp-5.2

Can I make this xpath search faster?


<root>
  <a auto='1'>
    <b>
      <c auto="1">
        <d auto="1"></d>
      </c>
    </b>
    <e auto="1">
      <f>
        <g auto="1"></g>
      </f>
    </e>
  </a>
</root>

Find all elements that

  1. Is descendant of context element
  2. Have auto attribute
  3. Highest level (have no ancestor with auto attribute between self and the context element)

So, if the context node is a then c and e should be returned.

I've implemented it in my php class:

$tempId='XDFAY69LA';
$this->setAttribute('tempId',$tempId);
$path=".//\*[@auto and not(ancestor::\*[@auto and ancestor::\*[@tempId='$tempId']])]";
$ar=$this->getElementsByXPath($path);
$this->removeAttribute('tempId');

But I found the query is slowly, maybe .. , because the query is too complex?, And is there a way that do a better job?


I wrote a testing, please have a look:

<?php
  $xml='
    <root>
      <a auto="1" tempId="current">
        <b>
          <c auto="1">
            <d auto="1"></d>
          </c>
        </b>
        <e auto="1">
          <f>
            <g auto="1"></g>
          </f>
        </e>
      </a>
    </root> ';

  $doc=new DomDocument();
  $tempId='XDFAY69LA';
  $doc->loadXml($xml);
  $domxpath=new DOMXPath($doc);
  $a=$domxpath->query('a')->item(0);
  $path=".//*[@auto and not(ancestor::*[@auto and ancestor::*[@tempId='$tempId']])]";
  $start=microtime(true);
  for($n=0;$n<1000;$n++){ //run 1000 times
    $a->setAttribute('tempId',$tempId);
    $ar=$domxpath->query($path,$a);
    $a->removeAttribute('tempId');
    for($i=0;$i<$ar->length;$i++){
      $node=$ar->item($i);
      //echo $node->tagName . "\n";
    }
  }
  $cost=round(1000 * (microtime(true)-$start));
  echo "time cost: $cost";
?>

Solution

  • Use:

    .//*[@auto and $tempId = ancestor::*[@auto][1]/@tempId]
    

    This selects all descendent elements (of the context node) that have an auto attribute and whose first ancestor that has auto attribute also has a tempId attribute with the same value as that of the tempId attribute of the context node (the latter being stored in the $tempId variable).

    Here we assume that no two different elements have the same value of their tempId attributes.

    A quick XSLT-based verification:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
    
     <xsl:template match="a">
       <xsl:variable name="tempId" select="@tempId"/>
    
         <xsl:copy-of select=
          ".//*[@auto and $tempId = ancestor::*[@auto][1]/@tempId]"/>
     </xsl:template>
    </xsl:stylesheet>
    

    When this transformation is applied on the provided XML document:

    <root>
        <a auto="1" tempId="current">
            <b>
                <c auto="1">
                    <d auto="1"></d>
                </c>
            </b>
            <e auto="1">
                <f>
                    <g auto="1"></g>
                </f>
            </e>
        </a>
    </root>
    

    the wanted, correct result (the two elements c and e) is produced:

    <c auto="1">
       <d auto="1"/>
    </c>
    <e auto="1">
       <f>
          <g auto="1"/>
       </f>
    </e>
    

    The performance cannot be improved only within the XPath expression and the inefficiency is due to having to use the // XPath pseudo-operator.

    If using XSLT, it is possible to have an efficient solution using keys:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>
    
     <xsl:key name="kfirstDescendents" match="*[@auto]"
      use="generate-id(ancestor::*[@auto][1])"/>
    
     <xsl:template match="a">
         <xsl:copy-of select=
          "key('kfirstDescendents', generate-id())"/>
     </xsl:template>
    </xsl:stylesheet>
    

    This transformation produces the same result as the first, and is significantly faster on documents with many nested elements that have an auto attribute.

    If use of XSLT is absolutely ruled out, one can achieve the same effect as XSLT keys with the use of hash tables (sorry, don't know PHP).