pythonnlpgrammarpeglr-grammar

Grammar parser for parsing parliamentary debates?


I'm looking to parse the plain text from a transcription tool (the goal is to render it into LegalDocML).

My issue is that I do not know where to start and learning a grammar parser is quite a steep learning curve. I'm looking for guidance as to what kind of parser would be appropriate for the problem.

My gut feel is that the below is a candidate for LR grammar tools as there might be some clear delimiters? (all caps for speaker, brackets for speaker role, square brackets to speech time) but also some NLP needs - for grievances the person the speech is addressed to is often loosely in the first sentence of the speech..

Any advice would be appreciated

as a sample:

Legislative Assembly
Thursday, 19 May 2022
               
THE SPEAKER (Mrs M.H. Roberts) took the chair at 9.00 am, acknowledged country and read prayers.
PAPER TABLED
A paper was tabled and ordered to lie upon the table of the house.
SMALL BUSINESS ASSISTANCE GRANTS
Statement by Minister for Small Business
Statement
MR D.T. PUNCH (Bunbury — Minister for Small Business) [9.01 am]: I would like to bring to the attention of the house some recent changes made by the McGowan government to the small business assistance grants. As I have previously advised the house, in February the state government announced a $67 million level 1 COVID-19 business assistance package, and more recently a $72 million package for businesses impacted by level 2 public health and social measures, taking the total committed to COVID-19 business support to almost $1.7 billion over the past two years. The level 1 package includes $42 million in rent relief assistance and the level 2 package includes a $66.8 million small business hardship grants program.
Last month, a revision and expansion of the small business hardship grants program was announced.
.
.
.
HOME INDEMNITY INSURANCE
Grievance
MR R.S. LOVE (Moore — Deputy Leader of the Opposition) [9.06 am]: I grieve today to the Parliamentary Secretary to the Minister for Commerce on behalf of Western Australian residents who have had their

Solution

  • This problem is indeed in an awkward wasteland between context-free parsing, which is far too precise to handle unstructured discourse, and natural language parsing, which (as I understand the current state of the art) is not designed to take advantage of subtle printed clues.

    My recommendation, for what it's worth, is that you use a collection of ad hoc regular expressions to attempt to capture the printed style and the boilerplate phrases. ("A paper was tabled and ordered to lie upon the table of the house.") That's what I did when I tried to do something like this a couple of decades ago with the Canadian equivalent (in the days in which Perl was state of the art), and it mostly worked, although a certain amount of manual intervention was required. (My style is to use sanity checks to try to detect cases which are mishandled and log them to allow future improvements.) How much work all that is will depend on how precise you need the results to be.

    It's quite possible that you could build a machine learning model which did a reasonable job, if you have access to enough computational resources. But you'll still need to do a lot of verification and recalibration, unless you can tolerate errors.