javascripttypescriptsyntax-highlightingprettifygoogle-code-prettify

What is the state of the art approach for creating own rules for Google's Prettify?


What is the state of the art approach for creating own rules for Google's Prettify?

I am not talking about changing the colors of existing rules, furthermore, I want to create new rules:

Or I want to show a tree and mark all spec.ts files bold, like:

    ├── src
    │   ├── app
    │   │   ├── app-routing.module.ts
    │   │   ├── app.component.css
    │   │   ├── app.component.css.map
    │   │   ├── app.component.html
    │   │   ├── app.component.scss
    │   │   ├── app.component.spec.ts
    │   │   ├── app.component.ts
    │   │   ├── app.module.ts
    │   │   └── lesson
    │   │       ├── lesson.component.css
    │   │       ├── lesson.component.css.map
    │   │       ├── lesson.component.html
    │   │       ├── lesson.component.scss
    │   │       ├── lesson.component.spec.ts
    │   │       └── lesson.component.ts

Solution

  • The easiest way is to work from an existing example.

    If you look at that file, you can see that it has some boilerplate that surrounds two lists of tuples:

    PR['registerLangHandler'](
        PR['createSimpleLexer'](
            [
              // Some tuples
            ],
            [
              // Some more tuples
            ],
            [/* Some file extensions without dot */]));
    

    If the file extension list contains "ext" and prettify is asked to prettify a code block with class="lang-ext ..." then this handler will be used.

    The two sets of tuples have similar structure. Here's some from the first set

    ['opn',             /^\(+/, null, '('],
    ['clo',             /^\)+/, null, ')'],
    

    and here's one from the second set.

    [PR['PR_KEYWORD'],     /^(?:block|c[ad]+r|catch|con[ds]|def(?:ine|un)|do|eq|eql|...)\b/, null],
    

    The PR['PR_KEYWORD'] is a predefined token type and matches up with styles in the predefined stylesheet:

    .kwd { color: #008 }  /* a keyword */
    

    What

    ['opn',             /^\(+/, null, '('],
    

    is saying is that when prettifying, if the beginning of the code starts with /^\(+/, then the input gets wrapped in <span class="opn">...</span>. opn (LISP open parenthesis) is a string literal since there is no predefined constant for it. If you define your own token type classes, you will probably have to define style rules for them in whatever page loads prettify.

    The string at the right, '(', is treated as a list of characters such that this rule is the only one that applies when the input text starts with one of those characters. This was an important optimization for IE 6 in years past.

    The only difference between the two groups of tuples is that the ones in the first list have this extra exclusive character element.

    IIRC, the null supported a feature that was found to be rarely if ever necessary and is no longer supported. Any value you put in that position will be ignored.

    The CSS handler has some documentation on all this, and demonstrates one other feature.

    ['lang-css-kw', /^(-?(?:[_a-z]|(?:\\[0-9a-f]+ ?))(?:[_a-z0-9\-]|\\(?:\\[0-9a-f]+ ?))*)\s*:/i],
    

    If the token type starts with lang-, instead of generating a <span class="lang-css-kw">...</span>, prettify will look for a language handler for the file extension css-kw and apply that recursively to the content in group 1. This feature is probably overkill here since modern JS engines consistently support lookahead, but it is necessary so that the HTML mode can recursively apply JS and CSS mode to the content of <script> and <style> blocks.


    Prettify can handle any transformation that relies only on a left to right pass over tokens. It doesn't have a way to collect symbols in side-tables for disambiguation, so there's no way to distinguish these two C snippets:

    typedef int t  // t is declared as a type
    t* x;   // declare x as a pointer to a t. "t" should have class="typ"
    

    from

    int t = 1;  // t is declared as a variable, not a type
    t* x;   // multiplication.  "t" should not have class="typ"
    

    This limited approach seems more robust in the face of small fragments of code and malformed code that you see often on sites like stackoverflow.

    It's reasonable to encode common language conventions like

    in your rules.