linuxunixawktext-processinggawk

Using Awk to process a file where each record has different fixed-width fields


I have some data files from a legacy system that I would like to process using Awk. Each file consists of a list of records. There are several different record types and each record type has a different set of fixed-width fields (there is no field separator character). The first two characters of the record indicate the type, from this you then know which fields should follow. A file might look something like this:

AAField1Field2LongerField3
BBField4Field5Field6VeryVeryLongField7Field8
CCField99

Using Gawk I can set the FIELDWIDTHS, but that applies to the whole file (unless I am missing some way of setting this on a record-by-record basis), or I can set FS to "" and process the file one character at a time, but that's a bit cumbersome.

Is there a good way to extract the fields from such a file using Awk?

Edit: Yes, I could use Perl (or something else). I'm still keen to know whether there is a sensible way of doing it with Awk though.


Solution

  • Hopefully this will lead you in the right direction. Assuming your multi-line records are guaranteed to be terminated by a 'CC' type row you can pre-process your text file using simple if-then logic. I have presumed you require fields1,5 and 7 on one row and a sample awk script would be.

    BEGIN {
            field1=""
            field5=""
            field7=""
    }
    {
        record_type = substr($0,1,2)
        if (record_type == "AA")
        {
            field1=substr($0,3,6)
        }
        else if (record_type == "BB")
        {
            field5=substr($0,9,6)
            field7=substr($0,21,18)
        }
        else if (record_type == "CC")
        {
            print field1"|"field5"|"field7
        }
    }
    

    Create an awk script file called program.awk and pop that code into it. Execute the script using :

    awk -f program.awk < my_multi_line_file.txt