pythonregextext-analysis

Remove Telephone numbers using Regular Expressions in Python 3


I am trying to remove telephone numbers from a bunch of documents that i have parsed using tika but I do not succeed.

Here is a screenshot taken by regex101 validator. As you can see, phone numbers are skipped.

The same example in text format is the following:

"Something here

and something here 9, but (I have something here as well), 123456, Hi guys!

+39.1234.325636 +39.321.1234567

sex male | date of birth 16/12/1927 | nationality italian

some stuff "

This is my Regex (I am not an expert in this field):

(\(00\d{2}\)|\(\+\d{2}\)|00\d{2}|\+\d{2})[\. ]??3\d{2}[\. \-]??\d{2,4}[\. \-]??\d{2,4}$

Notice that +39 (or 0039) is fixed and the first 3 in the second telephone number is also fixed.

Do you have any suggestions? Many thanks.


Solution

  • This works for me on regex101 validator given your input: (\+|00)39\.[0-9]+\.[0-9]+