I have the following lines in a file
$ cat test.txt
69|1074330570|1,sip:+121345633210x3Bverstat=TN-Validation-Passed|tel:+12134565534|0
69|1077822111|2,;tel:+2223120011~sip:+2223120051@vzpps.com;|sip:+13123120022@vzpps.com|0
69|1077988012|1,sip:+121510016070x3Bverstat=TN-Validation-Passed|tel:+136965339510x3Bnpd|0
I want to replace the third and fourth columns in the file with just phone numbers as following:
69|1074330570|2134563321|2134565534|0
69|1077822111|2223120011|3123120022|0
69|1077988012|2151001607|3696533951|0
The good part is that all files will have a '+' in third and fourth columns. Now the difficult part is that sometimes we get 11 digits as noticed in the first line third column (121345633210) and sometimes there is no '1' added after a '+'. So if there is a 1 after the plus sign then exclude 1. Otherwise, start the length after the '+' sign. Similarly, if there are 11 digits, just substring 10. Also, if there are more than 1 numbers like line 2, I need to pick up the first number after the '+' sign only like 2223120011 and not 2223120051.
I tried the following
awk -F"|" 'BEGIN {FS="+"}{print substr($3,2,10)}' test.txt
But it is giving incorrect value for second line since it needs to start after the '+' sign.
Issues with the current code:
-F"|"
and FS="+"
both define the input field delimiter so, technically, you should pick one or the other but not both; in this case FS="+"
takes precedence (ie, -F"|"
is ignored)-F"|"
, but also define it as the output field delimiter (recommend: BEGIN { FS=OFS="|"}
awk
string functions (eg, split()
, match()
, index()
, substr()
)Assumptions/Understandings:
+
+
is always a 10-digit or 11-digit number+
then we're only interested in the first +
(eg, we don't have to further parse based on tel
or sip
)One awk
idea:
awk '
BEGIN { FS=OFS="|" }
{ for (i=3; i<=4; i++) { # loop through 3rd and 4th fields
split($i,a,"+") # split on "+"
d1 = substr(a[2],1,1) # get 1st digit after the "+"
$i = substr(a[2],(d1==1 ? 2 : 1),10) # redefine ith field based on value of 1st digit
}
}
1 # print current line
' test.txt
NOTES:
This generates:
69|1074330570|2134563321|2134565534|0
69|1077822111|2223120011|3123120022|0
69|1077988012|2151001607|3696533951|0