text-parsingpython-textfsm

How to parse text over multiple lines with textfsm?


I understood that TextFSM is a good way to parse text files, however, I see that it can parse data over single lines, my question is how to parse text spread over multiple lines.

    <Page>


CUSIP No. 123456                  13G                   Page 2 of 10 Pages
-----------------------------------------------------------------------------
     (1)    NAMES OF REPORTING PERSONS

            ABC Ltd.

-----------------------------------------------------------------------------
     (2)    CHECK THE APPROPRIATE BOX IF A MEMBER OF A GROUP
                                                               (a)  [ ]
                                                               (b)  [X]
--------------------------------------------------------------------------------
     (3)    SEC USE ONLY
--------------------------------------------------------------------------------
     (4)    CITIZENSHIP OR PLACE OF ORGANIZATION

            Bruny Islands
--------------------------------------------------------------------------------
NUMBER OF      (5)   SOLE VOTING POWER
                     0
SHARES         -----------------------------------------------------------------

BENEFICIALLY   (6)   SHARED VOTING POWER

1,025,824 shares of Common Stock


OWNED BY       --------------------------------------------------------------

EACH           (7)   SOLE DISPOSITIVE POWER
                     0
REPORTING      --------------------------------------------------------------

PERSON WITH:   (8)   SHARED DISPOSITIVE POWER

1,025,824 shares of Common Stock


-----------------------------------------------------------------------------
     (9)    AGGREGATE AMOUNT BENEFICIALLY OWNED BY EACH REPORTING PERSON

1,025,824 shares of Common Stock


-----------------------------------------------------------------------------
     (10)   CHECK BOX IF THE AGGREGATE AMOUNT
            IN ROW (9) EXCLUDES CERTAIN SHARES
                                                                          [ ]
-----------------------------------------------------------------------------
     (11)   PERCENT OF CLASS REPRESENTED
            BY AMOUNT IN ROW (9)
            4.15%
-----------------------------------------------------------------------------
     (12)   TYPE OF REPORTING PERSON
            CO
-----------------------------------------------------------------------------

in the above text, I want to parse Names of reporting persons and Citizenship or place of organization, how which is not in a single line. What is the best way to approach this problem?


Solution

  • You can do this with TextFSM state transition.

    This template does what you need:

    Value REPORTING_PERSONS (\S+[\S ]+)
    Value CITIZENSHIP (\S+[\S ]+)
    
    Start
      ^.+NAMES OF REPORTING PERSONS -> Person
      ^.+CITIZENSHIP OR PLACE OF ORGANIZATION -> Citizenship
      ^ +NUMBER OF -> Record
    
    Person
      ^ +${REPORTING_PERSONS}
      ^-+ -> Start
    
    Citizenship
      ^ +${CITIZENSHIP}
      ^-+ -> Start
    

    Result:

    REPORTING_PERSONS    CITIZENSHIP
    -------------------  -------------
    ABC Ltd.             Bruny Islands
    

    Here you can see a few examples: https://github.com/google/textfsm/wiki/Code-Lab