I understood that TextFSM is a good way to parse text files, however, I see that it can parse data over single lines, my question is how to parse text spread over multiple lines.
<Page>
CUSIP No. 123456 13G Page 2 of 10 Pages
-----------------------------------------------------------------------------
(1) NAMES OF REPORTING PERSONS
ABC Ltd.
-----------------------------------------------------------------------------
(2) CHECK THE APPROPRIATE BOX IF A MEMBER OF A GROUP
(a) [ ]
(b) [X]
--------------------------------------------------------------------------------
(3) SEC USE ONLY
--------------------------------------------------------------------------------
(4) CITIZENSHIP OR PLACE OF ORGANIZATION
Bruny Islands
--------------------------------------------------------------------------------
NUMBER OF (5) SOLE VOTING POWER
0
SHARES -----------------------------------------------------------------
BENEFICIALLY (6) SHARED VOTING POWER
1,025,824 shares of Common Stock
OWNED BY --------------------------------------------------------------
EACH (7) SOLE DISPOSITIVE POWER
0
REPORTING --------------------------------------------------------------
PERSON WITH: (8) SHARED DISPOSITIVE POWER
1,025,824 shares of Common Stock
-----------------------------------------------------------------------------
(9) AGGREGATE AMOUNT BENEFICIALLY OWNED BY EACH REPORTING PERSON
1,025,824 shares of Common Stock
-----------------------------------------------------------------------------
(10) CHECK BOX IF THE AGGREGATE AMOUNT
IN ROW (9) EXCLUDES CERTAIN SHARES
[ ]
-----------------------------------------------------------------------------
(11) PERCENT OF CLASS REPRESENTED
BY AMOUNT IN ROW (9)
4.15%
-----------------------------------------------------------------------------
(12) TYPE OF REPORTING PERSON
CO
-----------------------------------------------------------------------------
in the above text, I want to parse Names of reporting persons and Citizenship or place of organization, how which is not in a single line. What is the best way to approach this problem?
You can do this with TextFSM state transition.
This template does what you need:
Value REPORTING_PERSONS (\S+[\S ]+)
Value CITIZENSHIP (\S+[\S ]+)
Start
^.+NAMES OF REPORTING PERSONS -> Person
^.+CITIZENSHIP OR PLACE OF ORGANIZATION -> Citizenship
^ +NUMBER OF -> Record
Person
^ +${REPORTING_PERSONS}
^-+ -> Start
Citizenship
^ +${CITIZENSHIP}
^-+ -> Start
Result:
REPORTING_PERSONS CITIZENSHIP
------------------- -------------
ABC Ltd. Bruny Islands
Here you can see a few examples: https://github.com/google/textfsm/wiki/Code-Lab