I have a data frame (df_bv_tikkie):
Omschrijving | IBAN_Tik |
---|---|
/TRTP/iDEAL/IBAN//BIC/ABNANL2A/NAME/name via Tikkie/REMI/00345346356 3253636 Tikkie NL30ABNA9999999991/EREF/11-07-2024 03:32 0031039140883718 | None |
SEPA Overboeking IBAN: BIC: NL30ABNA9999999992 Naam: AAB RETAIL INZ TIKKIE Omschrijving: Tikkie ID 390853985 1771 | None |
I am trying to extract the IBAN grom the 'Omschrijving' column and to write it into the 'IBAN_Tik' column.
The desired result is:
Omschrijving | IBAN_Tik |
---|---|
/TRTP/iDEAL/IBAN//BIC/ABNANL2A/NAME/name via Tikkie/REMI/00345346356 3253636 Tikkie NL30ABNA9999999991/EREF/11-07-2024 03:32 0031039140883718 | NL30ABNA9999999991 |
SEPA Overboeking IBAN: BIC: NL30ABNA9999999992 Naam: AAB RETAIL INZ TIKKIE Omschrijving: Tikkie ID 390853985 1771 | NL30ABNA9999999992 |
I have first tried:
import re
pattern = r'^NL[0-9]{2}[A-z0-9]{4}[0-9]{10}$'
df_bv_tikkie = df_bv_tikkie.reset_index()
index_Omschrijving = df_bv_tikkie.columns.get_loc('Omschrijving')
df_bv_tikkie['IBAN_Tik'] = None
index_iban = df_bv_tikkie.columns.get_loc('IBAN_Tik')
for row in range(0, len(df_bv_tikkie)):
iban = re.search(pattern, df_bv_tikkie.iat[row, index_Omschrijving])
print(iban)
df_bv_tikkie.iat[row, index_iban] = iban
This resulted only in None values in 'IBAN_Tik'
Then I tried
df_bv_tikkie['Omschrijving'].str.contains(pattern)
This also results in None values, as if the pattern is not recognised
However if I apply the pattern to a series
import re
pattern = r'^NL[0-9]{2}[A-z0-9]{4}[0-9]{10}$'
testiban2 = ['/TRTP/iDEAL/IBAN//BIC/ABNANL2A/NAME/name via Tikkie/REMI/00345346356 3253636 Tikkie NL30ABNA9999999991/EREF/11-07-2024 03:32 0031039140883718',
'SEPA Overboeking IBAN: BIC: NL30ABNA9999999992 Naam: AAB RETAIL INZ TIKKIE Omschrijving: Tikkie ID 390853985 1771']
for x in testiban:
result = re.search(pattern, x)
if result:
print(x)
this will print
NL30ABNA9999999991
NL30ABNA9999999992
--> suggesting that the pattern is accurate so I don understand why my first two attempts do not recognise the pattern and or don print it to the column
You should remove the anchors ^
and $
from the regexp, since they restrict matches to the entire string, not a portion.
Then use .str.exctract()
to get the part of the Omschrijving
field that matches the pattern. Since this returns the parts of the string matching capture groups, you need to wrap the pattern in ()
.
pattern = r'(NL[0-9]{2}[A-z0-9]{4}[0-9]{10})'
df_bv_tikkie['IBAN_Tik'] = df_bv_tikkie['Omschrijving'].str.extract(pattern)