Using the following code I am able to obtain the list of transactions in the desired format
import re
import pdfplumber
import csv
line_re = re.compile(r"(\d{2}/\d{2}/\d{4}\s+\d{2}/\d{2}/\d{4}.+)$")
transactions = []
with pdfplumber.open('./Bank Acct statement.pdf') as pdf:
for page in pdf.pages:
text = page.extract_text()
lines = text.split('\n')
for line in lines:
if re.match(line_re, line):
transactions.append(line)
The above returns the following list of transactions:
26/01/2023 26/01/2023 Payment Received Z Kona 8 000.00 8 085.87
26/01/2023 26/01/2023 Banking App Payment: Sihle -2 000.00 6 085.87
26/01/2023 26/01/2023 Payment Fee -1.50 6 084.37
26/01/2023 26/01/2023 SMS Payment Notification Fee -0.25 6 084.12
26/01/2023 26/01/2023 Payment Received Z Kona 15 000.00 21 084.12
26/01/2023 26/01/2023 Payment Received Z Kona 1 500.00 22 584.12
26/01/2023 26/01/2023 Payment Received Z Kona 2 000.00 24 584.12
26/01/2023 26/01/2023 Banking App Transfer to Ms K Savings (1816578655) -18 500.00 6 084.12
However when trying to extract the components and writing to csv, the csv file is empty. This is the modified code where I try and do the entire process:
import re
import pdfplumber
import csv
line_re = re.compile(r"(\d{2}/\d{2}/\d{4}\s+\d{2}/\d{2}/\d{4}.+)$")
transactions = []
with pdfplumber.open('./Bank Acct statement.pdf') as pdf:
for page in pdf.pages:
text = page.extract_text()
lines = text.split('\n')
for line in lines:
if re.match(line_re, line):
transactions.append(line)
# Define headers for the CSV file
csv_headers = [
"Posting Date",
"Transaction Date",
"Description",
"Amount",
"Balance",
]
# Specify the CSV file path
csv_file_path = "transactions.csv"
# Write transactions to the CSV file
with open(csv_file_path, mode="w", newline="") as csv_file:
csv_writer = csv.writer(csv_file)
# Write the headers as the first row
csv_writer.writerow(csv_headers)
# Write each transaction as a row in the CSV file
for transaction in transactions:
# Split the transaction line into its components
match = re.search(
r"(\d{2}/\d{2}/\d{4})\s+(\d{2}/\d{2}/\d{4})\s+([\w\s\(\),]+)\s+(-?\d{1,10}\s\d{1,2}\.\d{2})\s+(-?\d{1,10}\s\d{1,2}\.\d{2})",
transaction,
)
if match:
csv_writer.writerow([match.group(1), match.group(2), match.group(3), match.group(4), match.group(5)])
print(f"Transactions saved to {csv_file_path}")
The CSV only returns the headers, and all text in the first column. What am I doing wrong? I am not the best at regex.
Your file is empty because your pattern does not match.
I have added named capture groups for clarity, your updated pattern could be:
^(?P<posting_date>\d{2}/\d{2}/\d{4})\s+(?P<transaction_date>\d{2}/\d{2}/\d{4})\s+(?P<description>[\w\s(),:]+?)\s+(?P<Amount>-?\d{1,3}(?:\s\d{3})*\.\d{2})\s+(?P<Balance>-?\d{1,3}(?:\s\d{3})*\.\d{2})$
The pattern matches:
^
Start of string(?P<posting_date>\d{2}/\d{2}/\d{4})\s+
Match a date like pattern(?P<transaction_date>\d{2}/\d{2}/\d{4})\s+
Same pattern for the transaction date(?P<description>[\w\s(),:]+?)\s+
Match the allowed characters, as few as possible(?P<Amount>-?\d{1,3}(?:\s\d{3})*\.\d{2})\s+
Match the amount ending on .
and 2 digits with a whitespace char between 3 digits and leading 1-3 digits(?P<Balance>-?\d{1,3}(?:\s\d{3})*\.\d{2})
Same pattern for the balance$
End of stringSee a regex demo.
The updated part of the code:
match = re.search(
r"(?P<posting_date>\d{2}/\d{2}/\d{4})\s+(?P<transaction_date>\d{2}/\d{2}/\d{4})\s+(?P<description>[\w\s(),:]+?)\s+(?P<Amount>-?\d{1,3}(?:\s\d{3})*\.\d{2})\s+(?P<Balance>-?\d{1,3}(?:\s\d{3})*\.\d{2})$",
transaction,
)
The content of the transactions.csv
file:
Posting Date,Transaction Date,Description,Amount,Balance
26/01/2023,26/01/2023,Payment Received Z Kona,8 000.00,8 085.87
26/01/2023,26/01/2023,Banking App Payment: Sihle,-2 000.00,6 085.87
26/01/2023,26/01/2023,Payment Fee,-1.50,6 084.37
26/01/2023,26/01/2023,SMS Payment Notification Fee,-0.25,6 084.12
26/01/2023,26/01/2023,Payment Received Z Kona,15 000.00,21 084.12
26/01/2023,26/01/2023,Payment Received Z Kona,1 500.00,22 584.12
26/01/2023,26/01/2023,Payment Received Z Kona,2 000.00,24 584.12
26/01/2023,26/01/2023,Banking App Transfer to Ms K Savings (1816578655),-18 500.00,6 084.12