phpregexsplittext-parsingdelimited

Split a string by commas which are not inside potentially nested parentheses


Two days ago I started working on a code parser and I'm stuck.

How can I split a string by commas that are not inside brackets? Let me show you what I mean.

I have this string to parse:

one, two, three, (four, (five, six), (ten)), seven

I would like to get this result:

array(
 "one"; 
 "two"; 
 "three"; 
 "(four, (five, six), (ten))"; 
 "seven"
)

but instead I get:

array(
  "one"; 
  "two"; 
  "three"; 
  "(four"; 
  "(five"; 
  "six)"; 
  "(ten))";
  "seven"
)

How can I do this in PHP RegEx.


Solution

  • You can do that easier:

    preg_match_all('/[^(,\s]+|\([^)]+\)/', $str, $matches)
    

    But it would be better if you use a real parser. Maybe something like this:

    $str = 'one, two, three, (four, (five, six), (ten)), seven';
    $buffer = '';
    $stack = array();
    $depth = 0;
    $len = strlen($str);
    for ($i=0; $i<$len; $i++) {
        $char = $str[$i];
        switch ($char) {
        case '(':
            $depth++;
            break;
        case ',':
            if (!$depth) {
                if ($buffer !== '') {
                    $stack[] = $buffer;
                    $buffer = '';
                }
                continue 2;
            }
            break;
        case ' ':
            if (!$depth) {
                continue 2;
            }
            break;
        case ')':
            if ($depth) {
                $depth--;
            } else {
                $stack[] = $buffer.$char;
                $buffer = '';
                continue 2;
            }
            break;
        }
        $buffer .= $char;
    }
    if ($buffer !== '') {
        $stack[] = $buffer;
    }
    var_dump($stack);