I'm using pyparsing 3.0.9, python 3.9.16, and I'm trying to write a grammar for a (sub-)set of YAML. Not so much for the produced parser, as for the railroad diagrams. The actual state of the program is shown below.
The grammar (defined here), as expected, has recursion (mapping
s can contain mapping
s). However, I can't seem to find how (or where) to set the name, so it appears correctly in the diagram. Setting it in the Forward() declaration, or in the actaul declaration? Any combination I tried produces output errors.
If I declare rules which derive from common 'ancestor', I have to declare them with a copy() from that ancestor, else set_name()
fails except for the last one. This seems logical, except it doesn't seem to work always.
Some parts of the diagrams seem to be incorrect (not corresponding to the definition). Example: The node
definition produces alias
twice at the start.
Can someone point me in the right direction?
My code:
import pyparsing as pp
def make_parser():
mapping = pp.Forward().set_name('mapping')
label = pp.Word(pp.alphanums + '-_')
true_false = pp.one_of('yes no true false').set_name('true_false')
anchor = label.copy().set_name('anchor')
tag = label.copy().set_name('tag')
alias = label.copy().set_name('alias')
key_value = (
(pp.Keyword('yaml-scalar-event') +
(pp.Keyword('yaml-scalar-event') ^ mapping))
).set_name('key_value')
mapping = (
pp.Keyword('yaml-mapping-start-event') +
pp.ZeroOrMore(key_value) +
pp.Keyword('yaml-mapping-end-event')
)
sequence = (
anchor ^
tag
).set_name('sequence')
scalar = (
alias ^
tag ^
('plain_implicit' + true_false) ^
('quoted_implicit' + true_false) ^
mapping
).set_name('scalar')
node = (
alias ^
scalar ^
sequence ^
mapping
).set_name('node')
document = (
pp.Keyword('yaml-document-start-event') +
pp.ZeroOrMore(node) +
pp.Keyword('yaml-document-end-event')
).set_name('document')
stream = (
pp.Keyword('yaml-stream-start-event') +
pp.ZeroOrMore(document) +
pp.Keyword('yaml-stream-end-event')
).set_name('stream')
return stream
def test_parser():
parser = make_parser()
parser.create_diagram('yaml_grammar.html',
vertical = 2)
def main(args):
parser = make_parser()
parser.create_diagram('yaml_grammar.html', vertical = 2)
if __name__ == '__main__':
import sys
sys.exit(main(sys.argv))
Which produces the following output:
I love this! I agree, I like Michael Milton's addition of railroad diagramming to pyparsing and I've done some very similar work just to get a railroad diagram. Your question raised some interesting points about the railroad diagramming process, and I'm making a few tweaks to the pyparsing diagramming code to make the diagrams better.
First off, here are some changes in your parser to get a clean diagram:
def make_parser():
"""
stream ::= STREAM-START document* STREAM-END
document ::= DOCUMENT-START node DOCUMENT-END
node ::= ALIAS | SCALAR | sequence | mapping
sequence ::= SEQUENCE-START node* SEQUENCE-END
mapping ::= MAPPING-START (node node)* MAPPING-END
"""
# when I define Forwards, I try to go to the lowest possible
# term in the BNF, in this case node
# mapping = pp.Forward().set_name('mapping')
node = pp.Forward().set_name("node")
label = pp.Word(pp.alphanums + '-_')
true_false = pp.one_of('yes no true false').set_name('true_false')
anchor = label.copy().set_name('anchor')
tag = label.copy().set_name('tag')
alias = label.copy().set_name('alias')
# add Group around key_value to keep from merging it with surrounding
# terms in the diagram
key_value = pp.Group(
node + node
# (pp.Keyword('yaml-scalar-event') +
# (pp.Keyword('yaml-scalar-event') ^ mapping))
)#.set_name('key_value')
# I suppressed the key_value naming because I liked the explict node-node
# element in the diagram instead of the indirect key_value label.
mapping = (
# pyparsing will auto-promote strings to Literals, which should
# be sufficient for your diagramming efforts, and less typing for you
# (just so long as the string is immediately preceded or followed by
# some kind of pyparsing ParserElement)
# pp.Keyword('yaml-mapping-start-event') +
'yaml-mapping-start-event' +
# replaced ZeroOrMore usage with [...], purely a style choice
# pp.ZeroOrMore(key_value) +
key_value[...] +
'yaml-mapping-end-event'
).set_name("mapping")
sequence = (
anchor ^
tag
).set_name('sequence')
scalar = pp.Group(
# alias and mapping are already included in node
# alias ^
tag ^
('plain_implicit' + true_false) ^
('quoted_implicit' + true_false) #^
# mapping
).set_name('scalar')
# IMPORTANT!!! - be sure to use '<<=', not '=' when defining the expression
# that needs to be parsed by a Forward.
node <<= (
alias ^
scalar ^
sequence ^
mapping
).set_name('node')
document = (
'yaml-document-start-event' +
node[...] +
'yaml-document-end-event'
).set_name('document')
stream = (
'yaml-stream-start-event' +
document[...] +
'yaml-stream-end-event'
).set_name('stream')
return stream
My changes were:
node
the Forward instead of mapping
node + node
, per the BNFalias
and mapping
from scalar
, since they were being duplicated with node
in the diagram[...]
for repetitionUsing this code, to create the diagram:
parser.create_diagram(
'yaml_grammar.html',
show_groups=False,
vertical=2,
)
gives this diagram:
I didn't like a couple of things. For one, even though I set show_groups
to False, we still see a grouping around the key-value nodes - a bug I have now fixed. Also, using the (2) repetition indicator feels clunky when the repetition is only 2 elements long, so I've special-cased repetition to only use this notation for 3 or more elements.
With these fixes/changes (to be in the next pyparsing release), I now get this diagram, I hope it is close to your intended look (and I'm sorry to have taken so long to respond on this).