I want to create a column with all valid phone numbers available in each row of the text
column in a data frame using Python's phonenumber
library.
complains = ['If you validate your data, your confirmation number is 1-23-456-789, for a teacher you will be debited on the 3rd of each month 41.99, you will pay for the remaining 3 services offered:n/a',
'EMAIL VERIFYED, 12345 1st STUDENT 400 88888 2nd STUDENT 166.93 Your request has been submitted and your confirmation number is 1-234-567-777 speed is increased to 250MB $80.99 BILLING CYCLE 18',
'ADJUSTMENT FROM NOVEMBER TO MAY $80.99 Appointment for equipment change 7878940142']
complainsdf = pd.DataFrame(complains, index =['1', '2', '3'], columns =['text'])
I tried the code below. But I didn't get the results I expected.
complainsdf['tel'] = complainsdf.apply(lambda row:
phonenumbers.PhoneNumberMatcher(row['text'], "US"), axis=1)
complainsdf['tel'][0]
gives me the following output:
<phonenumbers.phonenumbermatcher.PhoneNumberMatcher at 0x2623ebfddf0>
and not the expected phone number.
The column tel
can contains multiple phone numbers per row. They are stored as an object of type phonenumbers.PhoneNumberMatcher
.
To extract the raw phone number, you have to iterate over the object, with a loop. For instance, you can do:
def get_phone_numbers(x):
# Extract the phone numbers from the text
nums = phonenumbers.PhoneNumberMatcher(x, "US")
# Convert the phone number format
return [phonenumbers.format_number(num.number, phonenumbers.PhoneNumberFormat.E164) for num in nums]
complainsdf['tel'] = complainsdf['text'].apply(get_phone_numbers)
complainsdf
text tel
1 If you validate your data, your confirmation n... []
2 EMAIL VERIFYED, 12345 1st STUDENT 400 88888 2n... []
3 ADJUSTMENT FROM NOVEMBER TO MAY $80.99 Appoint... [+17878940142]
I found the way to convert the format with PhoneNumberFormat.E164
in the documentation. Maybe you have to adapt it to your case.