parsingabstract-syntax-treeapldyalog

Dyalog APL Parser


I'm looking for a parser for Dyalog APL similar to aplparse, but one written either in APL itself or offered as part of the Dyalog engine. I'll be manipulating the AST in an APL program, which is why I need the output of the parser to be native to the language. Note that my question is not a duplicate of this thread since I have access to the full definition of the APL function that is to be parsed.


Solution

  • Aaron Hsu and I are actively working together on this with Hsu carrying most of the weight. This will be part of the Co-dfns project, but even with access to the full source code, it's quite tricky and there are several tradeoffs that need to be made:

    1. Does your code use tradfns with semiglobals? Parsing a b←c when a and/or b are semiglobals depends on runtime ⎕NC information and can parse alternatively be stranded assignment, modified assignment, or single assignment with result passthrough. Static parsing requires that ⎕NC of semiglobals never changes from a value type to a function/operator type.

    2. Does your code dynamically bind names with Execute (), ⎕FX, ⎕FIX, ⎕NS, ⎕WC, ⎕CY, or ⎕LOAD? Fully parsing these immediate hits up against the halting problem, so we have to handle with heuristics that can be adversarially worked around if you're unfortunate enough to be parsing arbitrary code of an unknown source.

    Anyway, with that in mind, you can get up and running with Co-dfns right now, specifically by running PS. Your code needs to all sit together in a scripted namespace, at which point AST-generation looks something like the below:

          ⍝ Load Co-dfns into your workspace
          ]link.import # path/to/Co-dfns/ws
          LOAD 'path/to/Co-dfns'
    
          ⍝ Run the parser
          codfns.PS⊃⎕NGET 'path/to/namespace.apln' 1
    

    However, if successful, that will just spill the AST to your session, so you probably want to bind the result to some variables:

          (p d t k n lx pos end)(xn xt)sym IN←codfns.PS⊃⎕NGET 'namespace.apln' 1
    

    Note, that the AST is represented as an Apter tree with parent vector p and depth vector d. This is really a case where the code is documentation at the moment, but if you need help or have questions, feel free to contact Aaron or me.

    Also, this is very much still a work in progress. Currently, dotted namespace notation is not fully in place, control structures are just stubbed out, name resolution has several sharp edges, along with a laundry list of small not-yet-implemented features mostly regarding tradfns.

    That said, if your code only uses dfns and doesn't dynamically bind names, then Co-dfns might work out-of-the box for you. We have had success compiling other projects under those constraints.

    Good luck, and don't hesitate to reach out!