pythontatsu

Tatsu: How to validate/process arithmetic like expressions in semantic actions


I have a TatSu grammar where I am parsing arithmetic expressions like SignalGroup {ABUS='A3 + A2 + A1 + A0';}.

Relevant grammar:

#--------------------------------------------- SIGNAL GROUPS BLOCK -----------------------------------------------------
signal_groups_block::SignalGroups
    =
    'SignalGroups' ~ [id:identifier] '{' ~ objs:{signal_group | annotation | udb} '}'
    ;
signal_group::SignalGroup
    =
    id:identifier '=' expr:signal_reference_expr ( ';' |
                                                   ('{' ~ attr:{signal_statement | annotation |udb} '}') )
    ;
signal_reference_expr
    =
    signal_name_array_opt
    | "'" ~ exprs:signal_reference_expr_items "'"
    ;
signal_reference_expr_items
    =
    left:signal_reference_expr_items op:'+' ~ right:signal_reference_expr_item
    | left:signal_reference_expr_items op:'-' ~ right:signal_reference_expr_item
    | left:signal_reference_expr_item
    ;
signal_reference_expr_item
    =
    '(' ~ @:signal_reference_expr_items ')'
    | signal_name_array_opt
    ;
signal_name_array_opt
    =
    id:identifier ['[' ~ msb:integer ['..' ~ lsb:integer] ']']
    ;

The AST output:

{
  "objs": [
    {
      "__class__": "SignalGroup",
      "expr": {
        "exprs": {
          "left": {
            "left": {
              "left": {
                "left": {
                  "id": "A3",
                  "lsb": null,
                  "msb": null
                },
                "op": null,
                "right": null
              },
              "op": "+",
              "right": {
                "id": "A2",
                "lsb": null,
                "msb": null
              }
            },
            "op": "+",
            "right": {
              "id": "A1",
              "lsb": null,
              "msb": null
            }
          },
          "op": "+",
          "right": {
            "id": "A0",
            "lsb": null,
            "msb": null
          }
        }
      },
      "attr": null,
      "id": "ABUS"
    }
  ],
  "id": null
}

I would like to do some semantic validation on this rule. That is, check that signals A3-A0 have been declared in some other (signal) block. If not declared, raise an error. I have kept a naming (symbol) table of all the signals for lookup while parsing the other (signal) block. I would like to know what is the best walk to 'walk' such an AST within the semantic action code as it can be very deep if my expression contains say 200 signals (i.e., A0 + A1 + .. A199). Right now, I only have a stub function like so:

class STILSemantics(ModelBuilderSemantics):

    ....

    def signal_groups_block(self, ast, node):
    """Signal groups block."""

      log.info('Parse %s block', node)
      print('got here')
      from tatsu.util import asjsons
      print(asjsons(ast))

      # TODO: HOW TO WALK THE AST HERE????

      return ast

I checked the TatSu doc and there is a section on Walking Models but it seems like that is only AFTER a full model AST is built. Maybe I am wrong. Is there a way to efficiently walk the AST of the signal_groups_block inside a semantic validation rule while the entire (top-level) model is being built?

Reference: https://tatsu.readthedocs.io/en/stable/models.html#walking-models


Solution

  • To check for pre-defined identifiers during parse you need a Symbol Table.

    You add symbols to the table in the semantics for the grammar clauses in which they are defined, and consult the symbol table in the semantics for the grammar clauses in which they are used.

    Because TatSu preserves full information for the source input, it may be easier to check for those semantics after the parse, using a walker. The errors reported can be precise to the line an column number, and users usually don't mind that syntactic errors are reported first, and semantic errors later, because TatSu parsers normally stop on the first error (there's support for parse recovery in Tatsu, but it's undocumented).

    Symbol tables during the parse phase are necessary only in languages in which a token may, or may not be a keyword depending on context (yes, PEG can handle some context-sensitive cases with the help semantic actions).