As seen below, the repeated phrase starts with a dashed line. Then some key-values appear, and at last there is description with an unknown count of lines. And all ends with an EOF.
I've problem with the description. If it's not the last phrase, description ends with the start of a dashed line, but for last phrase it ends with an EOF.
So i'm quite confused about building a grammar for "description". What ways do you prefer/suggest for this kind of schema?
Thank you.
------
AAA: Value1
BBB: Value2
Description
Lorem ipsum dolor sit amet
consectetur adipiscing elit.
------
AAA: Value3
BBB: Value4
CCC: Value5
DDD: Value6
Description
In efficitur, turpis sit amet malesuada dignissim
Turpis nunc imperdiet ipsum, eu auctor leo arcu at libero
consectetur adipiscing elit.
------
AAA: Value7
BBB: Value
EEE: Value6
Description
In efficitur, turpis sit amet malesuada dignissim
Turpis nunc imperdiet ipsum, eu auctor leo arcu at libero
consectetur adipiscing elit
Lorem ipsum dolor sit amet.
See how msg_terminator
is used in this sample code (needed in two places, once for the detection of the end of the repetition in the definition of msg
, and once in the overall entry expr
- so helpful to define as an expression on its own).
I've also added some features of pyparsing in this example beyond the basics:
ParserElement.set_default_whitespace_chars
for a parser that has significant newlines[...]
for ZeroOrMore
, and [...:expr]
for ZeroOrMore
with stop_on=expr
expr("name")
for expr.set_results_name("name")
Dict
to auto-name contained groups of key-value expressionspp.common
expressions to parse a timestamp and convert to a python datetimepp.Empty
to advance past optional whitespaceI hope these help you in other parts of your parser.
# https://stackoverflow.com/questions/75782477/how-to-use-pyparsing-for-multilined-fields-that-has-two-different-types-of-endin
sample = """\
timestamp: 2001-01-01 12:34
Color: red
msg
Now is the Winter of our discontent
Made glorious Summer by this sun of York.
---
timestamp: 2001-01-01 12:34
Color: mauve
Material: poly-cotton
msg
Tomorrow and tomorrow and tomorrow
Creeps in this petty pace from day to day.
"""
import pyparsing as pp
pp.ParserElement.set_default_whitespace_chars(" ")
NL = pp.LineEnd().suppress()
COLON = pp.Suppress(":")
timestamp = pp.common.iso8601_datetime.add_parse_action(pp.common.convert_to_datetime("%Y-%m-%d %H:%M"))
tag = pp.Group(pp.Word(pp.alphas, pp.alphanums)("tag")
+ COLON
+ pp.Empty()
+ pp.rest_of_line("value")
)
# look for terminating "---" OR the end of the string
msg_terminator = ('---' + NL | pp.StringEnd()).suppress()
msg = pp.Group(
pp.Suppress("msg" + NL)
# the following line is equivalent to
# pp.ZeroOrMore(pp.rest_of_line + NL, stop_on=msg_terminator)
+ (pp.rest_of_line + NL)[...:msg_terminator]
)
entry_expr = pp.Group(
pp.Suppress('timestamp:') + timestamp("timestamp") + NL
+ pp.Dict((tag + NL)[...])("tags")
+ msg("msg")
+ msg_terminator
)
for entry in entry_expr[...].parse_string(sample):
print(entry.dump())
Prints:
[datetime.datetime(2001, 1, 1, 12, 34), [['Color', 'red']], ['Now is the Winter of our discontent', 'Made glorious Summer by this sun of York.']]
- msg: ['Now is the Winter of our discontent', 'Made glorious Summer by this sun of York.']
- tags: [['Color', 'red']]
- Color: 'red'
[0]:
['Color', 'red']
- tag: 'Color'
- value: 'red'
- timestamp: datetime.datetime(2001, 1, 1, 12, 34)
[0]:
2001-01-01 12:34:00
[1]:
[['Color', 'red']]
- Color: 'red'
[0]:
['Color', 'red']
- tag: 'Color'
- value: 'red'
[2]:
['Now is the Winter of our discontent', 'Made glorious Summer by this sun of York.']
[datetime.datetime(2001, 1, 1, 12, 34), [['Color', 'mauve'], ['Material', 'poly-cotton']], ['Tomorrow and tomorrow and tomorrow', 'Creeps in this petty pace from day to day.']]
- msg: ['Tomorrow and tomorrow and tomorrow', 'Creeps in this petty pace from day to day.']
- tags: [['Color', 'mauve'], ['Material', 'poly-cotton']]
- Color: 'mauve'
- Material: 'poly-cotton'
[0]:
['Color', 'mauve']
- tag: 'Color'
- value: 'mauve'
[1]:
['Material', 'poly-cotton']
- tag: 'Material'
- value: 'poly-cotton'
- timestamp: datetime.datetime(2001, 1, 1, 12, 34)
[0]:
2001-01-01 12:34:00
[1]:
[['Color', 'mauve'], ['Material', 'poly-cotton']]
- Color: 'mauve'
- Material: 'poly-cotton'
[0]:
['Color', 'mauve']
- tag: 'Color'
- value: 'mauve'
[1]:
['Material', 'poly-cotton']
- tag: 'Material'
- value: 'poly-cotton'
[2]:
['Tomorrow and tomorrow and tomorrow', 'Creeps in this petty pace from day to day.']