My organization has account numbers that are comprised of combining multiple fields. The last field is always 4 characters (typically 0000)
Org Account
01 01-123-0000
01 01-456-0000
02 02-789-0000
02 02-456-0000
03 03-987-0000
03 03-123-1234
I also have a dictionary mapping of how many characters the last component should be.
MAP = {'01': 4, '02': 3, '03': 3}
However there also special mappings for Org 03:
D03_SPECIAL_MAP = {'0000': '012', '1234': '123'}
My code to update the last component is:
for i,r in df.iterrows():
updated = False # Keep track if we have updated this row
# Split off last component from the rest of the account
Acct, last_comp = r['Account'].rsplit('-',1)
# Check if we need to update code length and the code length does not match
if r['Org'] in MAP and len(last_comp) != MGMT_MAP[r['Org']]:
df.at[i,'Account'] = '-'.join(Acct) + "-" + last_comp.zfill(MAP[r['Org']])
updated = True
# Special mapping for Org 03
if r['Org'] =='03' and last_comp in D03_SPECIAL_MAP.keys():
df.at[i,'Account'] = '-'.join(Acct) + "-" + D03_SPECIAL_MAP[last_comp]
updated = True
if not updated: # Join Default if we have not hit either of he conditions above
df.at[i,'Account'] = '-'.join(Acct) + "-" + last_comp
The output of this will be:
Org Account
01 01-123-0000
01 01-456-0000
02 02-789-000
02 02-456-000
03 03-987-012
03 03-123-123
My code works as expected except this process is a little slow to check every record. Is there a way to perform the same operation without using df.iterrows()
?
Since working with strings is hardly vectorizable, just use a simple python function and a list comprehension:
def replace(org, account):
a, b = account.rsplit('-', maxsplit=1)
if org == '03':
return f'{a}-{D03_SPECIAL_MAP[b]}'
return f'{a}-{b[:MAP[org]]}'
df['Account'] = [replace(o, a) for o, a in zip(df['Org'], df['Account'])]
Output:
Org Account
0 01 01-123-0000
1 01 01-456-0000
2 02 02-789-000
3 02 02-456-000
4 03 03-987-012
5 03 03-123-123