phpmysqlregexsql-parser

SQL RegEx Create Split: Include SET


I'm writing a regex pattern to split MySQL CREATE statements into column definition arrays. So far, it works great for everything aside from SET columns. Here's the process:

$lines = explode("\n", $create_sql);

foreach ($lines as $line) {
    $line = str_replace('NOT NULL', 'NOT_NULL', $line);

    $pattern = '/`(.*)`[\s]*([^\s]*)[\s]*(NOT_NULL|NULL)[\s]*(.*),/';

    $search = preg_match($pattern, $line, $matches);

    if ($search !== false && count($matches) === 5) {
        $columns[$matches[1]] = array(
            'type' => $matches[2],
            'null' => $matches[3] === 'NULL',
            'extra' => $matches[4],
        );
    }
}

This process works great on a column with a definition like this:

`id`          INT(11)               NOT NULL         AUTO_INCREMENT,

...but fails on SET columns. How can I best accommodate for the extra quotes and parentheses in this line?

`platform`    SET('iOS', 'Android') NULL             DEFAULT NULL,

To summarize, I need to integrate a pattern to match SET('one', 'two', n) in a string.


Solution

  • First just let slightly modify your pattern expression, first for readability (\s* instead of [\s]*), then to ensure working even with lowercase statements :

    $pattern = '/`(.*)`\s*([^\s]*)\s*(NOT_NULL|NULL)\s*(.*),/i';
    

    Then, to answer your question, prepare the pattern depending on a SET is present or not in the line:

    $sub_pattern = stripos($line, ' SET(') ?
      'SET\([^\)]*\)\s*[^\s]*' 
      : '[^\s]*';
    $pattern = '/`(.*)`\s*(' . $sub_pattern . ')\s*(NOT_NULL|NULL)\s*(.*),/i';
    

    Note that, however, this code is not secured against syntax errors in the source lines.