pythonpandaspython-phonenumber

How to obtain all the phone numbers in each row of a df, using phonenumbers Python Library?


I want to create a column with all valid phone numbers available in each row of the text column in a data frame using Python's phonenumber library.

complains = ['If you validate your data, your confirmation number is 1-23-456-789, for a teacher you will be debited on the 3rd of each month 41.99, you will pay for the remaining 3 services offered:n/a',
             'EMAIL VERIFYED, 12345 1st STUDENT 400 88888 2nd STUDENT 166.93 Your request has been submitted and your confirmation number is 1-234-567-777 speed is increased to 250MB $80.99 BILLING CYCLE 18',
             'ADJUSTMENT FROM NOVEMBER TO MAY $80.99 Appointment for equipment change 7878940142']

complainsdf = pd.DataFrame(complains, index =['1', '2', '3'], columns =['text'])

I tried the code below. But I didn't get the results I expected.

complainsdf['tel'] = complainsdf.apply(lambda row: 
    phonenumbers.PhoneNumberMatcher(row['text'], "US"), axis=1)

complainsdf['tel'][0] gives me the following output: <phonenumbers.phonenumbermatcher.PhoneNumberMatcher at 0x2623ebfddf0> and not the expected phone number.


Solution

  • The column tel can contains multiple phone numbers per row. They are stored as an object of type phonenumbers.PhoneNumberMatcher.

    To extract the raw phone number, you have to iterate over the object, with a loop. For instance, you can do:

    def get_phone_numbers(x):
        # Extract the phone numbers from the text
        nums = phonenumbers.PhoneNumberMatcher(x, "US")
        # Convert the phone number format
        return [phonenumbers.format_number(num.number, phonenumbers.PhoneNumberFormat.E164) for num in nums]
    
    complainsdf['tel'] = complainsdf['text'].apply(get_phone_numbers)
    complainsdf
    

                                                     text   tel
    1   If you validate your data, your confirmation n...   []
    2   EMAIL VERIFYED, 12345 1st STUDENT 400 88888 2n...   []
    3   ADJUSTMENT FROM NOVEMBER TO MAY $80.99 Appoint...   [+17878940142]
    

    I found the way to convert the format with PhoneNumberFormat.E164 in the documentation. Maybe you have to adapt it to your case.