unixawk

How to filter columns in awk?


I was wondering how to filter the following lines in AWK:

DSL - 

  1. Digital Simulation Language.  Extensions to FORTRAN to simulate analog
computer functions.  "DSL/90 - A Digital Simulation Program for Continuous
System Modelling", Proc SJCC 28, AFIPS (Spring 1966).  Version: DSL/90 for
the IBM 7090.  Sammet 1969, p.632.

FLIP - 

  1. Early assembly language on G-15.  Listed in CACM 2(5):16 (May 1959).

  2. "FLIP User's Manual", G. Kahn, TR 5, INRIA 1981.

  3. Formal LIst Processor.  Early language for pattern-matching on LISP
structures.  Similar to CONVERT.  "FLIP, A Format List Processor", W.
Teitelman, Memo MAC-M-263, MIT 1966.

So I can get something like this:

DSL

FLIP

I am using the following sentences in AWK:

BEGIN { RS = "\n\n\n" ;  FS = " - " } 

{ print $1 }

But what I get is just this:

DSL

Thanks in advance!


Solution

  • @JonathanLeffler gave you a good awk answer to your specific question but if you're going to be working on files with that format a lot, you may want to consider reformatting them to have records separated by newlines with each list item on a single line, e.g.:

    $ cat file
    DSL -
    
      1. Digital Simulation Language.  Extensions to FORTRAN to simulate analog
    computer functions.  "DSL/90 - A Digital Simulation Program for Continuous
    System Modelling", Proc SJCC 28, AFIPS (Spring 1966).  Version: DSL/90 for
    the IBM 7090.  Sammet 1969, p.632.
    
    FLIP -
    
      1. Early assembly language on G-15.  Listed in CACM 2(5):16 (May 1959).
    
      2. "FLIP User's Manual", G. Kahn, TR 5, INRIA 1981.
    
      3. Formal LIst Processor.  Early language for pattern-matching on LISP
    structures.  Similar to CONVERT.  "FLIP, A Format List Processor", W.
    Teitelman, Memo MAC-M-263, MIT 1966.
    
    $ awk '!/^[[:space:]]*$/{printf "%s%s", (NF==2 && /-[[:space:]]*$/ ? rs rs : (/^ +[[:digit:]]+\./ ? rs : "")), $0; rs="\n"} END{print ""}' file
    DSL -
      1. Digital Simulation Language.  Extensions to FORTRAN to simulate analogcomputer functions.  "DSL/90 - A Digital Simulation Program for ContinuousSystem Modelling", Proc SJCC 28, AFIPS (Spring 1966).  Version: DSL/90 forthe IBM 7090.  Sammet 1969, p.632.
    
    FLIP -
      1. Early assembly language on G-15.  Listed in CACM 2(5):16 (May 1959).
      2. "FLIP User's Manual", G. Kahn, TR 5, INRIA 1981.
      3. Formal LIst Processor.  Early language for pattern-matching on LISPstructures.  Similar to CONVERT.  "FLIP, A Format List Processor", W.Teitelman, Memo MAC-M-263, MIT 1966.
    

    That way you can process the output easily to print or do whatever else you want, e.g.

    1) to print every header line plus first bullet item:

    $ awk '...' file | awk 'BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"} {print $1,$2}'
    DSL -
      1. Digital Simulation Language.  Extensions to FORTRAN to simulate analogcomputer functions.  "DSL/90 - A Digital Simulation Program for ContinuousSystem Modelling", Proc SJCC 28, AFIPS (Spring 1966).  Version: DSL/90 forthe IBM 7090.  Sammet 1969, p.632.
    
    FLIP -
      1. Early assembly language on G-15.  Listed in CACM 2(5):16 (May 1959).
    

    2) to print the header line plus the second bullet item of just the "FLIP" record:

    $ awk '...' file | awk 'BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"} /^FLIP -/{print $1,$3}'
    FLIP -
      2. "FLIP User's Manual", G. Kahn, TR 5, INRIA 1981.
    

    3) to print the header line plus a count of the bullet items for that header:

    $ awk '...' file | awk 'BEGIN{RS=""; FS=OFS="\n"} {print $1 NF-1}'
    DSL - 1
    FLIP - 3
    

    etc., etc.