phpregexrecursive-regex

About php regexp for recursive pattern


I've this code:

$string="some text {@block}outside{@block}inside{@}outside{@} other text";

function catchPattern($string,$layer){
  preg_match_all(
    "/\{@block\}".
      "(".
        "(".
           "[^()]*|(?R)".
        ")*".
      ")".
    "\{@\}/",$string,$nodes);
  if(count($nodes)>1){
    for($i=0;$i<count($nodes[1]); $i++){
      if(is_string($nodes[1][$i])){
        if(strlen($nodes[1][$i])>0){
          echo "<pre>Layer ".$layer.":   ".$nodes[1][$i]."</pre><br />";
          catchPattern($nodes[1][$i],$layer+1);
        }
      }
    }
  }
}

catchPattern($string,0);

That gives me this output:

Layer 0:   outside{@block}inside{@}outside

Layer 1:   inside

And all it's ok! But If I change a bit string and regexp:

$string="some text {@block}outside{@block}inside{@end}outside{@end} other text";

function catchPattern($string,$layer){
  preg_match_all(
    "/\{@block\}".
      "(".
        "(".
           "[^()]*|(?R)".
        ")*".
      ")".
    "\{@end\}/",$string,$nodes);
  if(count($nodes)>1){
    for($i=0;$i<count($nodes[1]); $i++){
      if(is_string($nodes[1][$i])){
        if(strlen($nodes[1][$i])>0){
          echo "<pre>Layer ".$layer.":   ".$nodes[1][$i]."</pre><br />";
          catchPattern($nodes[1][$i],$layer+1);
        }
      }
    }
  }
}

catchPattern($string,0);

I didnt get any output. Why? I expected the same output.


Solution

  • The problem is that the backtracking limit is exhausted. You can always modify the backtracking limit. However, for the cases I have come across, rewriting the regex is the better solution.

    You can't just anyhow modify an existing regex and expect to make it work, especially for recursive regex. It seems that you take the existing bracket matching regex and modify it. There are a few problems in your regex:

    The final solution is:

    /\{@block\}((?:[^{}]++|(?R))*+)\{@end\}/
    

    Demo

    Footnotes

    1: It is quite obvious, since text matching [^{}]+ will never start with {, while the text matching the recursive regex must start with {.