phpregexreplacecycle-detection

Find circular replacement patterns


Suppose I would like to expand a string by replacing placeholders from a dictionary. The replacement strings can also contain placeholders:

$pattern = "#a# #b#";
$dict = array("a" => "foo", "b" => "bar #c#", "c" => "baz");
while($match_count = preg_match_all('/#([^#])+#/', $pattern, $matches)) {
    for($i=0; $i<$match_count; $i++) {
        $key = $matches[1][$i];
        if(!isset($dict[$key])) { throw new Exception("'$key' not found!"); }
        $pattern = str_replace($matches[0][$i], $dict[$key], $pattern); 
     }   
}
echo $pattern;

This works fine, as long as there are no circular replacement patterns, for example "c" => "#b#". Then the program will be thrown into an endless loop until the memory is exhausted.

Is there an easy way to detect such patterns? I'm looking for a solution where the distance between the replacements can be arbitrarily long, eg. a->b->c->d->f->a
Ideally, the solution would also happen while in the loop and not with a separate analysis.


Solution

  • Thanks to the comment by georg and this post, I came up with a solution that converts the patterns into a graph and uses topological sort to check for circular replacement.

    Here is my solution:

    $dict = array("a" => "foo", "b" => "bar #c#", "c" => "baz #b#");
    
    # Store incoming and outgoing "connections" for each key => pattern replacement
    $nodes = array();
    foreach($dict as $patternName => $pattern) {
        if (!isset($nodes[$patternName])) {
            $nodes[$patternName] = array("in" => array(), "out" => array());
        }
        $match_count = preg_match_all('/#([^#])+#/', $pattern, $matches);
        for ($i=0; $i<$match_count; $i++) {
            $key = $matches[1][$i];
            if (!isset($dict[$key])) { throw new Exception("'$key' not found!"); }
            if (!isset($nodes[$key])) {
                $nodes[$key] = array("in" => array(), "out" => array());
            }
            $nodes[$key]["in"][]          = $patternName;
            $nodes[$patternName]["out"][] = $key;
         }   
    }
    # collect leaf nodes (no incoming connections)
    $leafNodes = array();
    foreach ($nodes as $key => $connections) {
        if (empty($connections["in"])) {
            $leafNodes[] = $key;
        }
    }
    # Remove leaf nodes until none are left
    while (!empty($leafNodes)) {
        $nodeID = array_shift($leafNodes);
        foreach ($nodes[$nodeID]["out"] as $outNode) {
            $nodes[$outNode]['in'] = array_diff($nodes[$outNode]['in'], array($nodeID));
            if (empty($nodes[$outNode]['in'])) {
                $leafNodes[] = $outNode;
            }
        }
        $nodes[$nodeID]['out'] = array();
    }
    # Check for non-leaf nodes. If any are left, there is a circular pattern
    foreach ($nodes as $key => $node) {
        if (!empty($node["in"]) || !empty($node["out"]) ) {
            throw new Exception("Circular replacement pattern for '$key'!");
        }
    }
    
    # Now we can safely do replacement 
    $pattern = "#a# #b#";
    while ($match_count = preg_match_all('/#([^#])+#/', $pattern, $matches)) {
        $key = $matches[1][$i];
        $pattern = str_replace($matches[0][$i], $dict[$key], $pattern); 
    }
    echo $pattern;