I am trying to remove telephone numbers from a bunch of documents that i have parsed using tika but I do not succeed.
Here is a screenshot taken by regex101 validator. As you can see, phone numbers are skipped.
The same example in text format is the following:
"Something here
and something here 9, but (I have something here as well), 123456, Hi guys!
+39.1234.325636 +39.321.1234567
sex male | date of birth 16/12/1927 | nationality italian
some stuff "
This is my Regex (I am not an expert in this field):
(\(00\d{2}\)|\(\+\d{2}\)|00\d{2}|\+\d{2})[\. ]??3\d{2}[\. \-]??\d{2,4}[\. \-]??\d{2,4}$
Notice that +39 (or 0039) is fixed and the first 3 in the second telephone number is also fixed.
Do you have any suggestions? Many thanks.
This works for me on regex101 validator given your input:
(\+|00)39\.[0-9]+\.[0-9]+