pythonpandas

Pandas Batch Update Account String


My organization has account numbers that are comprised of combining multiple fields. The last field is always 4 characters (typically 0000)

Org           Account
 01       01-123-0000
 01       01-456-0000
 02       02-789-0000
 02       02-456-0000
 03       03-987-0000
 03       03-123-1234

I also have a dictionary mapping of how many characters the last component should be.

MAP = {'01': 4, '02': 3, '03': 3}

However there also special mappings for Org 03:

D03_SPECIAL_MAP = {'0000': '012', '1234': '123'}

My code to update the last component is:

for i,r in df.iterrows():
    updated = False # Keep track if we have updated this row

    # Split off last component from the rest of the account
    Acct, last_comp = r['Account'].rsplit('-',1)

    # Check if we need to update code length and the code length does not match
    if r['Org'] in MAP and len(last_comp) != MGMT_MAP[r['Org']]:
        df.at[i,'Account'] = '-'.join(Acct) + "-" + last_comp.zfill(MAP[r['Org']])
        updated = True

    # Special mapping for Org 03
    if r['Org'] =='03' and last_comp in D03_SPECIAL_MAP.keys():
        df.at[i,'Account'] = '-'.join(Acct) + "-" + D03_SPECIAL_MAP[last_comp]
        updated = True

    if not updated: # Join Default if we have not hit either of he conditions above
        df.at[i,'Account'] = '-'.join(Acct) + "-" + last_comp

The output of this will be:

Org           Account
 01       01-123-0000
 01       01-456-0000
 02       02-789-000
 02       02-456-000
 03       03-987-012
 03       03-123-123

My code works as expected except this process is a little slow to check every record. Is there a way to perform the same operation without using df.iterrows()?


Solution

  • Since working with strings is hardly vectorizable, just use a simple python function and a list comprehension:

    def replace(org, account):  
        a, b = account.rsplit('-', maxsplit=1)
        if org == '03':
            return f'{a}-{D03_SPECIAL_MAP[b]}'
        return f'{a}-{b[:MAP[org]]}'
        
    df['Account'] = [replace(o, a) for o, a in zip(df['Org'], df['Account'])]
    

    Output:

      Org      Account
    0  01  01-123-0000
    1  01  01-456-0000
    2  02   02-789-000
    3  02   02-456-000
    4  03   03-987-012
    5  03   03-123-123