regexlinuxawk

awk to extract a block of text


I am trying to figure out an awk command/script to extract a block of text from a large file. The file subsection I am interested in is this:

  Board Info: #512
    Manufacturer: "Dell Inc."
    Product: "0X3D66"
    Version: "A02"
    Serial: "..CN7016343F00IE."
  Chassis Info: #768

The Board Info and Chassis Info lines have 2 leading spaces, while the indented block has 4. I would like to not assume that the ending line starts with Chassis Info (could be something else) and just rely on getting to the 'next' line starting with 2 spaces.

This:

awk '/^\s{2}Board Info/,/^\s{2}[^B ]/' dump.txt

solves this particular instance but will not work if instead of 'Chassis Info', the endling block line starts with the letter B (e.g., BOM).

If I use:

awk '/^\s{2}Board Info/,/^\s{2}\S*/' dump.txt

the ending pattern is also matched by the 'Board Info' line, so I just get that line. How do I get that indented block (4 leading spaces) without hard-coding the ending block (as above) and relying on the ending pattern being 'the next line that starts with exactly 2 leading spaces'?


Solution

  • I would ameloriate your code

    awk '/^\s{2}Board Info/,/^\s{2}[^B ]/' dump.txt
    

    following way, let dump.txt content be

    undesired text
      Board Info: #512
        Manufacturer: "Dell Inc."
        Product: "0X3D66"
        Version: "A02"
        Serial: "..CN7016343F00IE."
      Other Info: #768
    another undesired text
    more undesired text
    

    then

    awk '/^\s{2}Board Info/,/^\s{2}[[:alpha:]]/&&!/^\s{2}Board Info/' dump.txt
    

    gives output

      Board Info: #512
        Manufacturer: "Dell Inc."
        Product: "0X3D66"
        Version: "A02"
        Serial: "..CN7016343F00IE."
      Other Info: #768
    

    Explanation: I altered ending condition by requiring line to start with 2 white-space character followed by any alphabetic character AND (&&) NOT (!) being Board Info line (by negating start condition).

    (tested in GNU Awk 5.3.1)

    How do I modify what you have to just print the indented block?

    You might add action which will print if there are (at least) 3 white-space character at beginning of line following way

    awk '/^\s{2}Board Info/,/^\s{2}[[:alpha:]]/&&!/^\s{2}Board Info/{if(/^\s{3}/){print}}' dump.txt
    

    which will give following output

        Manufacturer: "Dell Inc."
        Product: "0X3D66"
        Version: "A02"
        Serial: "..CN7016343F00IE."