pythonpandas

Python: columns must be same length as key when splitting a column


I have two address columns and I want to extract the last word from the first column and the first word from the second column. In the provided example there aren't two words in column 'Address2', but I want to build the code in such a way that it will work regardless of how the dataset will look like. Sometimes the address2 can be one word, something it will have 2, etc..

data = {
    'Address1': ['3 Steel Street', '1 Arnprior Crescent', '40 Bargeddie Street Blackhill'],
    'Address2': ['Saltmarket', 'Castlemilk', 'Blackhill']
}

df = pd.DataFrame(data)

I have no problem with column 'Address1':

df[['StringStart', 'LastWord']] = df['Address1'].str.rsplit(' ', n=1, expand=True)

The problem comes with column 'Address2' where if I apply the above code I an error: Columns must be same length as key

I understand where the problem is coming from - I am trying to split one column which has one element into two columns. I am sure there is a way in which this can be handled to allow the split anyway and return Null if there isn't a word and a value if there is.


Solution

  • Using str.extract() might be better for several reasons: it handles all cases, offers precision with regular expressions, and eliminates the risk of value errors.

    import pandas as pd
    
    data = {
        'Address1': ['3 Steel Street', '1 Arnprior Crescent', '40 Bargeddie Street Blackhill'],
        'Address2': ['Saltmarket', 'Castlemilk East', 'Blackhill']
    }
    df = pd.DataFrame(data)
    
    df[['StringStart', 'LastWord']] = df['Address1'].str.rsplit(' ', n=1, expand=True)
    
    df[['FirstWord_Address2', 'Remaining_Address2']] = (
        df['Address2'].str.extract(r'^(\S+)\s*(.*)$')
    )
    
    print(df)
    

    Or:

    df[['Address1_Prefix', 'Address1_LastWord']] = df['Address1'].str.extract(r'^(.*\b)\s+(\S+)$')
    
    df[['Address2_FirstWord', 'Address2_Remaining']] = df['Address2'].str.extract(r'^(\S+)\s*(.*)$')
    

    Output:

                            Address1         Address2          StringStart   LastWord FirstWord_Address2 Remaining_Address2
    0                 3 Steel Street       Saltmarket              3 Steel     Street         Saltmarket
    1            1 Arnprior Crescent  Castlemilk East           1 Arnprior   Crescent         Castlemilk               East
    2  40 Bargeddie Street Blackhill        Blackhill  40 Bargeddie Street  Blackhill          Blackhill