pythoniban

str,contains(patern) to write patter to df column


I have a data frame (df_bv_tikkie):

Omschrijving IBAN_Tik
/TRTP/iDEAL/IBAN//BIC/ABNANL2A/NAME/name via Tikkie/REMI/00345346356 3253636 Tikkie NL30ABNA9999999991/EREF/11-07-2024 03:32 0031039140883718 None
SEPA Overboeking IBAN: BIC: NL30ABNA9999999992 Naam: AAB RETAIL INZ TIKKIE Omschrijving: Tikkie ID 390853985 1771 None

I am trying to extract the IBAN grom the 'Omschrijving' column and to write it into the 'IBAN_Tik' column.

The desired result is:

Omschrijving IBAN_Tik
/TRTP/iDEAL/IBAN//BIC/ABNANL2A/NAME/name via Tikkie/REMI/00345346356 3253636 Tikkie NL30ABNA9999999991/EREF/11-07-2024 03:32 0031039140883718 NL30ABNA9999999991
SEPA Overboeking IBAN: BIC: NL30ABNA9999999992 Naam: AAB RETAIL INZ TIKKIE Omschrijving: Tikkie ID 390853985 1771 NL30ABNA9999999992

I have first tried:

import re
pattern = r'^NL[0-9]{2}[A-z0-9]{4}[0-9]{10}$'

df_bv_tikkie = df_bv_tikkie.reset_index()  
index_Omschrijving = df_bv_tikkie.columns.get_loc('Omschrijving')

df_bv_tikkie['IBAN_Tik'] = None

index_iban = df_bv_tikkie.columns.get_loc('IBAN_Tik')

for row in range(0, len(df_bv_tikkie)):
  iban = re.search(pattern, df_bv_tikkie.iat[row, index_Omschrijving])
  print(iban)
  df_bv_tikkie.iat[row, index_iban] = iban

This resulted only in None values in 'IBAN_Tik'

Then I tried

df_bv_tikkie['Omschrijving'].str.contains(pattern)

This also results in None values, as if the pattern is not recognised

However if I apply the pattern to a series

import re
pattern = r'^NL[0-9]{2}[A-z0-9]{4}[0-9]{10}$'

testiban2 = ['/TRTP/iDEAL/IBAN//BIC/ABNANL2A/NAME/name via Tikkie/REMI/00345346356 3253636 Tikkie NL30ABNA9999999991/EREF/11-07-2024 03:32 0031039140883718', 
'SEPA Overboeking IBAN: BIC: NL30ABNA9999999992 Naam: AAB RETAIL INZ TIKKIE Omschrijving: Tikkie ID 390853985  1771']



for x in testiban:
  result = re.search(pattern, x)
  if result:
    print(x)

this will print

NL30ABNA9999999991

NL30ABNA9999999992

--> suggesting that the pattern is accurate so I don understand why my first two attempts do not recognise the pattern and or don print it to the column


Solution

  • You should remove the anchors ^ and $ from the regexp, since they restrict matches to the entire string, not a portion.

    Then use .str.exctract() to get the part of the Omschrijving field that matches the pattern. Since this returns the parts of the string matching capture groups, you need to wrap the pattern in ().

    pattern = r'(NL[0-9]{2}[A-z0-9]{4}[0-9]{10})'
    df_bv_tikkie['IBAN_Tik'] = df_bv_tikkie['Omschrijving'].str.extract(pattern)