Railroad diagrams in Pyparsing: How about Forward() declarations? Rule renaming?

I'm using pyparsing 3.0.9, python 3.9.16, and I'm trying to write a grammar for a (sub-)set of YAML. Not so much for the produced parser, as for the railroad diagrams. The actual state of the program is shown below.

The grammar (defined here), as expected, has recursion (mappings can contain mappings). However, I can't seem to find how (or where) to set the name, so it appears correctly in the diagram. Setting it in the Forward() declaration, or in the actaul declaration? Any combination I tried produces output errors.
If I declare rules which derive from common 'ancestor', I have to declare them with a copy() from that ancestor, else set_name() fails except for the last one. This seems logical, except it doesn't seem to work always.
Some parts of the diagrams seem to be incorrect (not corresponding to the definition). Example: The node definition produces alias twice at the start.

Can someone point me in the right direction?

My code:

import pyparsing as pp

def make_parser():
    mapping = pp.Forward().set_name('mapping')
    label = pp.Word(pp.alphanums + '-_')
    true_false = pp.one_of('yes no true false').set_name('true_false')

    anchor = label.copy().set_name('anchor')
    tag    = label.copy().set_name('tag')
    alias  = label.copy().set_name('alias')

    key_value = (
        (pp.Keyword('yaml-scalar-event') +
            (pp.Keyword('yaml-scalar-event') ^ mapping))
    ).set_name('key_value')

    mapping = (
        pp.Keyword('yaml-mapping-start-event') +
        pp.ZeroOrMore(key_value) +
        pp.Keyword('yaml-mapping-end-event')
    )

    sequence = (
        anchor ^
        tag
    ).set_name('sequence')

    scalar = (
        alias ^
        tag ^
        ('plain_implicit' + true_false) ^
        ('quoted_implicit' + true_false) ^
        mapping
    ).set_name('scalar')

    node = (
        alias ^
        scalar ^
        sequence ^
        mapping
    ).set_name('node')

    document = (
        pp.Keyword('yaml-document-start-event') +
        pp.ZeroOrMore(node) +
        pp.Keyword('yaml-document-end-event')
    ).set_name('document')

    stream = (
        pp.Keyword('yaml-stream-start-event') +
        pp.ZeroOrMore(document) +
        pp.Keyword('yaml-stream-end-event')
    ).set_name('stream')

    return stream


def test_parser():
    parser = make_parser()

    parser.create_diagram('yaml_grammar.html',
        vertical = 2)



def main(args):
    parser = make_parser()
    parser.create_diagram('yaml_grammar.html', vertical = 2)


if __name__ == '__main__':
    import sys
    sys.exit(main(sys.argv))

Which produces the following output:

Solution

I love this! I agree, I like Michael Milton's addition of railroad diagramming to pyparsing and I've done some very similar work just to get a railroad diagram. Your question raised some interesting points about the railroad diagramming process, and I'm making a few tweaks to the pyparsing diagramming code to make the diagrams better.

First off, here are some changes in your parser to get a clean diagram:

def make_parser():
    """
    stream ::= STREAM-START document* STREAM-END
    document ::= DOCUMENT-START node DOCUMENT-END
    node ::= ALIAS | SCALAR | sequence | mapping
    sequence ::= SEQUENCE-START node* SEQUENCE-END
    mapping ::= MAPPING-START (node node)* MAPPING-END
    """

    # when I define Forwards, I try to go to the lowest possible
    # term in the BNF, in this case node
    # mapping = pp.Forward().set_name('mapping')
    node = pp.Forward().set_name("node")
    label = pp.Word(pp.alphanums + '-_')
    true_false = pp.one_of('yes no true false').set_name('true_false')

    anchor = label.copy().set_name('anchor')
    tag    = label.copy().set_name('tag')
    alias  = label.copy().set_name('alias')

    # add Group around key_value to keep from merging it with surrounding
    # terms in the diagram
    key_value = pp.Group(
        node + node
        # (pp.Keyword('yaml-scalar-event') +
        #     (pp.Keyword('yaml-scalar-event') ^ mapping))
    )#.set_name('key_value')
    # I suppressed the key_value naming because I liked the explict node-node
    # element in the diagram instead of the indirect key_value label.

    mapping = (
        # pyparsing will auto-promote strings to Literals, which should
        # be sufficient for your diagramming efforts, and less typing for you
        # (just so long as the string is immediately preceded or followed by
        # some kind of pyparsing ParserElement)
        # pp.Keyword('yaml-mapping-start-event') +
        'yaml-mapping-start-event' +
        # replaced ZeroOrMore usage with [...], purely a style choice
        # pp.ZeroOrMore(key_value) +
        key_value[...] +
        'yaml-mapping-end-event'
    ).set_name("mapping")

    sequence = (
        anchor ^
        tag
    ).set_name('sequence')

    scalar = pp.Group(
        # alias and mapping are already included in node
        # alias ^
        tag ^
        ('plain_implicit' + true_false) ^
        ('quoted_implicit' + true_false) #^
        # mapping
    ).set_name('scalar')

    # IMPORTANT!!! - be sure to use '<<=', not '=' when defining the expression
    # that needs to be parsed by a Forward.
    node <<= (
        alias ^
        scalar ^
        sequence ^
        mapping
    ).set_name('node')

    document = (
        'yaml-document-start-event' +
        node[...] +
        'yaml-document-end-event'
    ).set_name('document')

    stream = (
        'yaml-stream-start-event' +
        document[...] +
        'yaml-stream-end-event'
    ).set_name('stream')

    return stream

My changes were:

make node the Forward instead of mapping
MUST USE "<<=" operator to define contents of a Forward (this is a common mistake, and pyparsing offers some diagnostic warnings to help catch it)
changed key_value to just node + node, per the BNF
removed alias and mapping from scalar, since they were being duplicated with node in the diagram
cosmetic changes
- changed Keyword literals to simple string literals
- used [...] for repetition

Using this code, to create the diagram:

    parser.create_diagram(
        'yaml_grammar.html',
        show_groups=False,
        vertical=2,
    )

gives this diagram:

I didn't like a couple of things. For one, even though I set show_groups to False, we still see a grouping around the key-value nodes - a bug I have now fixed. Also, using the (2) repetition indicator feels clunky when the repetition is only 2 elements long, so I've special-cased repetition to only use this notation for 3 or more elements.

With these fixes/changes (to be in the next pyparsing release), I now get this diagram, I hope it is close to your intended look (and I'm sorry to have taken so long to respond on this).