wordpresssyntax-highlightinggeshi

Improve GeSHi syntax highlighting for T-SQL


I'm using WP-GeSHi in WordPress, and largely I'm very happy with it. There are, however, a few minor scenarios where the color highlighting is too aggressive when a keyword is:

  1. a variable name (denoted by a leading @)
  2. part of another word (e.g. IN in INSERTED)
  3. the combination (part of a variable name, e.g. JOIN and IN in @JOINBING)
  4. inside square brackets (e.g. [status])

Certain keywords are case sensitive, and others are not. The below screenshot sums up the various cases where this goes wrong:

enter image description here

Now, the code in GeSHi.php is pretty verbose, and I am by no means a PHP expert. I'm not afraid to get my hands a little dirty here, but I'm hoping someone else has made corrections to this code and can provide some pointers. I already implemented a workaround to prevent @@ROWCOUNT from being highlighted incorrectly, but this was easy, since @@ROWCOUNT is defined - I just shuffled the arrays around so that it was found before ROWCOUNT.

What I'd like is for GeSHi to completely ignore keywords that aren't whole words (whether they are prefixed by @ or immediately surrounded by other letters/numbers). JOIN should be grey, but @JOIN and JOINS should not. I'd also like it to ignore keywords that are inside square brackets (after all, this is how we tell Management Studio to not color highlight it, and it's also how we tell the SQL engine to ignore reserved words, keywords, and invalid identifiers).


Solution

  • You can do this by adding a PARSER_CONTROL control to the end of the array:

    'PARSER_CONTROL' => array(
        'KEYWORDS' => array(
            1 => array( // "1" maps to the main keywords near the start of the array
                'DISALLOWED_BEFORE' => '(?![\(\w])',
                'DISALLOWED_AFTER' => '(?![\(\w])'
            ),
            5 => array( // "5" maps to the shorter keywords like "IN" that are further down
                'DISALLOWED_BEFORE' => '(?![\(\w])',
                'DISALLOWED_AFTER' => '(?![\(\w])'
            ),
        )
    )
    

    Edit

    I've modified your gist to move some of the keywords you added to SYMBOLS back to KEYWORDS (though in their own group and with your custom style), and I updated the PARSER_CONTROL array to match the new keyword array indexes and also to include the default regex that geshi generates. Here is the link:

    https://gist.github.com/jamend/07e60bf0b9acdfdeee7a