Seen tons of examples but I cannot seem to get any to work in this script from https://stackoverflow.com/a/72720612
by another user @Just Khaithang on this site and it works great but I need to retain my column spacing as well since it is critical.
This is the .txt file sample as I have posted here a couple times. There is 1 space at the beginning and 20 spaces from the beginning of column 1 to the beginning of column 2 and 4 spaces in between 2 and 3. see below for the script. The outcome changes a date from user input thus using the variable $broken_date
. This script is called from another shell script with awk -v
. The "" spaces in between work but since column 1 varies it is not staying aligned.
146327A 0000000020220422 000002012633825-0003-1
137149D 0000000045220419 000004512632587-0003-0
137050C 0000000018220419 000001812632410-0003-0
137147A 0000000045220419 000004512632487-0003-0
137233B 0000000144220421 000014412630711-0003-1
137599B 0000000120220419 000012012632543-0003-0
137604D 0000000015220419 000001512632588-0003-0
151031-001E 0000000041220517 000004112575320-0003-1
151248-001A 0000000021220421 000002112629944-0003-1
151249-001A 0000000005220422 000000512634524-0003-1
151827-002B 0000000040220421 000004012629223-0003-1
127941B 0000000045220422 000004512634676-0003-1
137105A 0000000020220421 000002012630791-0003-1
132136A 0000000005220419 000000512632590-0003-0
132137A 0000000005220419 000000512632591-0003-0
134180D 0000000052220419 000006012622399-0003-1
134307-004K 0000000016220420 000001612635621-0003-0
141014-001B 0000000040220419 000004012632585-0003-0
{
c2=$2
c3=$3
sub("0+","",c2)
sub("0+","",c3)
sub("-.*","",c3)
if (length(c2) == 8) {
c2_value=substr(c2,1,2)
} else if (length(c2) == 9) {
c2_value=substr(c2,1,3)
}
if (length(c3) == 10) {
c3_value=substr(c3,1,2)
} else if (length(c3) == 11) {
c3_value=substr(c3,1,3)
}
if(c2_value != c3_value) {
sub("[1-9].*$","",$2)
date="$broken_date" # this value taken from user input
print $1" "$2 c2_value broken_date" "$3
} else {
print $0
}
}
Output should be
146327A 0000000020220422 000002012633825-0003-1
137149D 0000000045220419 000004512632587-0003-0
137050C 0000000018220419 000001812632410-0003-0
137147A 0000000045220419 000004512632487-0003-0
137233B 0000000144220421 000014412630711-0003-1
137599B 0000000120220419 000012012632543-0003-0
137604D 0000000015220419 000001512632588-0003-0
151031-001E 0000000041220517 000004112575320-0003-1
151248-001A 0000000021220421 000002112629944-0003-1
151249-001A 0000000005220422 000000512634524-0003-1
151827-002B 0000000040220421 000004012629223-0003-1
127941B 0000000045220422 000004512634676-0003-1
137105A 0000000020220421 000002012630791-0003-1
132136A 0000000005220419 000000512632590-0003-0
132137A 0000000005220419 000000512632591-0003-0
134180D 0000000052220909 000006012622399-0003-1
134307-004K 0000000016220420 000001612635621-0003-0
141014-001B 0000000040220419 000004012632585-0003-0
The only difference is in the date but that is what it needs to do on the 3rd line from the bottom 2nd column where I entered 220909.
I am doing this in a Korn shell via MKS Toolkit; Awk says file version 9.2.3.2096. This is on an old Windows XP machine.
Assumptions:
GNU awk/FIELDWIDTHS
is available to OP (in comments OP mentions not able to get FIELDWIDTHS
to work which I take to mean that OP is running GNU awk
otherwise I'd expect OP to state something about an error or FIELDWIDTHS
not available)One idea for modifying OP's current code to work with GNU awk/FIELDWIDTHS
:
broken_date='220909'
awk -v bdate="${broken_date}" '
BEGIN { FIELDWIDTHS="21 20 100"
fmt="%-21s%-20s%s\n" # define our printf format to match FIELDSWIDTHS
}
{ c2=$2; gsub(/ /,"",c2); sub("0+","",c2)
c3=$3; gsub(/ /,"",c3); sub("0+","",c3); sub("-.*","",c3)
if (length(c2) == 8) { c2_value=substr(c2,1,2) }
else if (length(c2) == 9) { c2_value=substr(c2,1,3) }
if (length(c3) == 10) { c3_value=substr(c3,1,2) }
else if (length(c3) == 11) { c3_value=substr(c3,1,3) }
if (c2_value != c3_value) { printf fmt,$1,substr($2,1,length(gensub(/ /,"","g",$2))-6) bdate,$3 }
else { print $0 }
}
' x > y
Reworking OPs logic (also addresses length(c3) == 9
) while maintaining FIELDWIDTHS
approach:
broken_date='220909'
awk -v bdate="${broken_date}" '
BEGIN { FIELDWIDTHS="21 20 100"
fmt="%-21s%-20s%s\n"
}
{ c2=$2;
gsub(/^[0]+| /,"",c2 ) # strip leading zeroes and all spaces
c2=substr(c2,1,length(c2)-6) # strip off last 6 characters
pfx=$2 # find the prefix of $2
gsub(/ /,"",pfx) # strip all spaces
pfx=substr(pfx,1,length(pfx)-6) # strip off last 6 characters
split($3,a,"-") # split $3 on hyphens
c3=a[1] # grab 1st hyphen delimited field
gsub(/^[0]+| /,"",c3) # strip leading zeroes and all spaces
c3=substr(c3,1,length(c3)-8) # strip off last 8 characters
if (c2 != c3) $2=pfx bdate # replace $2 with its prefix + bdate (aka broken_date)
printf fmt,$1,$2,$3
}
' x > y
Both of these generate:
$ diff x y
16c16
< 134180D 0000000052220419 000006012622399-0003-1
---
> 134180D 0000000052220909 000006012622399-0003-1