linuxawkterminalanalysisleading-zero

Select only those rows from a column where column 2 has more than 2 leading zeroes in Linux


So I want to grab only the rows that has 2 or more than 2 leading zeroes in the ID column ($2) between the 5th and 10th character. For example column 2 has ID 156700923134, so from 5th to 10th character 1567-009231-34 i.e. 009231. In this case we do see the leading zeroes. However in the second row we have 777754635373, so grab 546353, which does not have leading zeroes. I am working on a pipe delimited file.

Ex: 1 | 156700923134 | hkohin | 23


4 | 777754635373 | hhkdys | 45


3 | 678387700263 | ieysff | 09

Expected output: 1 | 156700923134 | hkohin | 23 
                        --OR-- 
                     156700923134 

So far I have the substring 009231, 546353, 877002 as output but I don't know how to check for leading zeroes.

This is what I used to get to the above result:
awk -F'|' '{print $2, substr($2, 5, 6) }' file.dat | head -5

Solution

  • () for test condition allows any valid expression

    awk -F'|' '( match($2,"^....00") ) { print print $2, substr($2, 5, 6) }' file.dat
    

    Answer #2:

    Takes more lines to be generic:
    
    zstart=5
    zcnt=3
    zeros=$(eval printf '0%.0s' {1..$zcnt})
    
    echo 'xxx|1234000890|end' |
      awk -F'|' -vzstart=$zstart -vzcnt=$zcnt -vzeros="$zeros" '
         ### debug { print substr($2, zstart, zcnt); }
         (zeros == substr($2, zstart, zcnt)) { print }'