shellawkmks

need to retain column spacing in awk script


Seen tons of examples but I cannot seem to get any to work in this script from https://stackoverflow.com/a/72720612 by another user @Just Khaithang on this site and it works great but I need to retain my column spacing as well since it is critical. This is the .txt file sample as I have posted here a couple times. There is 1 space at the beginning and 20 spaces from the beginning of column 1 to the beginning of column 2 and 4 spaces in between 2 and 3. see below for the script. The outcome changes a date from user input thus using the variable $broken_date. This script is called from another shell script with awk -v. The "" spaces in between work but since column 1 varies it is not staying aligned.

 146327A             0000000020220422    000002012633825-0003-1
 137149D             0000000045220419    000004512632587-0003-0
 137050C             0000000018220419    000001812632410-0003-0
 137147A             0000000045220419    000004512632487-0003-0
 137233B             0000000144220421    000014412630711-0003-1
 137599B             0000000120220419    000012012632543-0003-0
 137604D             0000000015220419    000001512632588-0003-0
 151031-001E         0000000041220517    000004112575320-0003-1
 151248-001A         0000000021220421    000002112629944-0003-1
 151249-001A         0000000005220422    000000512634524-0003-1
 151827-002B         0000000040220421    000004012629223-0003-1
 127941B             0000000045220422    000004512634676-0003-1
 137105A             0000000020220421    000002012630791-0003-1
 132136A             0000000005220419    000000512632590-0003-0
 132137A             0000000005220419    000000512632591-0003-0
 134180D             0000000052220419    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0

{
    c2=$2
    c3=$3
    sub("0+","",c2)
    sub("0+","",c3)
    sub("-.*","",c3)
    if (length(c2) == 8) {
        c2_value=substr(c2,1,2)
    } else if (length(c2) == 9) {
        c2_value=substr(c2,1,3)
    }

    if (length(c3) == 10) {
        c3_value=substr(c3,1,2)
    } else if (length(c3) == 11) {
        c3_value=substr(c3,1,3)
    }

    if(c2_value != c3_value) {
        sub("[1-9].*$","",$2)
        date="$broken_date"  # this value taken from user input
        print  $1"            "$2 c2_value broken_date"   "$3
    } else {
        print $0
    }
}

Output should be

 146327A             0000000020220422    000002012633825-0003-1
 137149D             0000000045220419    000004512632587-0003-0
 137050C             0000000018220419    000001812632410-0003-0
 137147A             0000000045220419    000004512632487-0003-0
 137233B             0000000144220421    000014412630711-0003-1
 137599B             0000000120220419    000012012632543-0003-0
 137604D             0000000015220419    000001512632588-0003-0
 151031-001E         0000000041220517    000004112575320-0003-1
 151248-001A         0000000021220421    000002112629944-0003-1
 151249-001A         0000000005220422    000000512634524-0003-1
 151827-002B         0000000040220421    000004012629223-0003-1
 127941B             0000000045220422    000004512634676-0003-1
 137105A             0000000020220421    000002012630791-0003-1
 132136A             0000000005220419    000000512632590-0003-0
 132137A             0000000005220419    000000512632591-0003-0
 134180D             0000000052220909    000006012622399-0003-1
 134307-004K         0000000016220420    000001612635621-0003-0
 141014-001B         0000000040220419    000004012632585-0003-0

The only difference is in the date but that is what it needs to do on the 3rd line from the bottom 2nd column where I entered 220909.

I am doing this in a Korn shell via MKS Toolkit; Awk says file version 9.2.3.2096. This is on an old Windows XP machine.


Solution

  • Assumptions:

    One idea for modifying OP's current code to work with GNU awk/FIELDWIDTHS:

    broken_date='220909'
    
    awk -v bdate="${broken_date}" '
    BEGIN  { FIELDWIDTHS="21 20 100"
             fmt="%-21s%-20s%s\n"                # define our printf format to match FIELDSWIDTHS
           }
           { c2=$2; gsub(/ /,"",c2); sub("0+","",c2)
             c3=$3; gsub(/ /,"",c3); sub("0+","",c3); sub("-.*","",c3)
    
                  if (length(c2) == 8)  { c2_value=substr(c2,1,2) }
             else if (length(c2) == 9)  { c2_value=substr(c2,1,3) }
    
                  if (length(c3) == 10) { c3_value=substr(c3,1,2) }
             else if (length(c3) == 11) { c3_value=substr(c3,1,3) }
    
             if (c2_value != c3_value) { printf fmt,$1,substr($2,1,length(gensub(/ /,"","g",$2))-6) bdate,$3 }
             else                      { print $0 }
           }
    ' x > y
    

    Reworking OPs logic (also addresses length(c3) == 9) while maintaining FIELDWIDTHS approach:

    broken_date='220909'
    
    awk -v bdate="${broken_date}" '
    BEGIN  { FIELDWIDTHS="21 20 100"
             fmt="%-21s%-20s%s\n"
           }
           { c2=$2;
             gsub(/^[0]+| /,"",c2 )                    # strip leading zeroes and all spaces
             c2=substr(c2,1,length(c2)-6)              # strip off last 6 characters
    
             pfx=$2                                    # find the prefix of $2
             gsub(/ /,"",pfx)                          # strip all spaces
             pfx=substr(pfx,1,length(pfx)-6)           # strip off last 6 characters
    
             split($3,a,"-")                           # split $3 on hyphens
             c3=a[1]                                   # grab 1st hyphen delimited field
             gsub(/^[0]+| /,"",c3)                     # strip leading zeroes and all spaces
             c3=substr(c3,1,length(c3)-8)              # strip off last 8 characters
    
             if (c2 != c3) $2=pfx bdate                # replace $2 with its prefix + bdate (aka broken_date)
    
             printf fmt,$1,$2,$3
           }
    ' x > y
    

    Both of these generate:

    $ diff x y
    16c16
    <  134180D             0000000052220419    000006012622399-0003-1
    ---
    >  134180D             0000000052220909    000006012622399-0003-1