So I want to grab only the rows that has 2 or more than 2 leading zeroes in the ID column ($2) between the 5th and 10th character. For example column 2 has ID 156700923134, so from 5th to 10th character 1567-009231-34 i.e. 009231. In this case we do see the leading zeroes. However in the second row we have 777754635373, so grab 546353, which does not have leading zeroes. I am working on a pipe delimited file.
Ex: 1 | 156700923134 | hkohin | 23
4 | 777754635373 | hhkdys | 45
3 | 678387700263 | ieysff | 09
Expected output: 1 | 156700923134 | hkohin | 23
--OR--
156700923134
So far I have the substring 009231, 546353, 877002 as output but I don't know how to check for leading zeroes.
This is what I used to get to the above result:
awk -F'|' '{print $2, substr($2, 5, 6) }' file.dat | head -5
() for test condition allows any valid expression
awk -F'|' '( match($2,"^....00") ) { print print $2, substr($2, 5, 6) }' file.dat
Answer #2:
Takes more lines to be generic:
zstart=5
zcnt=3
zeros=$(eval printf '0%.0s' {1..$zcnt})
echo 'xxx|1234000890|end' |
awk -F'|' -vzstart=$zstart -vzcnt=$zcnt -vzeros="$zeros" '
### debug { print substr($2, zstart, zcnt); }
(zeros == substr($2, zstart, zcnt)) { print }'