pythonpandasdataframelibphonenumber

How can I handle invalid phone numbers using python's phonenumbers package and apply?


I have a dataframe containing a variety of phone numbers that I want to extract the time zone for. I am apply to loop over the series in the dataframe as follows

external_calls_cleaned_df['time_zone'] = external_calls_cleaned_df.apply(lambda x: timezone.time_zones_for_number(phonenumbers.parse(str(x.external_number), None)), axis=1)

And this works just fine as long as the phone number in x.external_number doesn't contain a single invalid phone number; however, if one single invalid phone number is found in the entire series, it fails.

What I would like it to do is return 'Null' or None whenever it gets an invalid number- anything, actually - I can filter those out after the fact, but I don't want the process to stop at that point.

I have tried to wrap the timezone function in a new function and then execute it with try

def get_timezone(df):
    try:
        x = timezone.time_zones_for_number(phonenumbers.parse(str(df.external_number), None))
    except:
       None
    return x

and then using

external_calls_cleaned_df['time_zone'] = external_calls_cleaned_df.apply(lambda x:get_timezone(x), axis=1)

The process completes then, but it fills the 'time_zone' field with None for every value.

To accomplish this I am using the phonenumbers package which is a port from the libphonenumber java package from google.

I can't share the phone numbers in my database for obvious reasons, so I don't know how to turn this into a reproducible example, or I would provide it.

Can anyone help me?

Thanks, Brad


Solution

  • Try refactoring your code in order to use map with the target column "external_number" instead of apply with the whole dataframe, like this:

    def get_timezone(x):
        try:
            return timezone.time_zones_for_number(phonenumbers.parse(str(x), None))
        except:
            return None
    
    
    external_calls_cleaned_df["time_zone"] = external_calls_cleaned_df[
        "external_number"
    ].map(get_timezone)