I have a directory full of CSVs that need to be imported into different tables of a SQL Server database. Fortunately the filename of the appended CSVs starts with the string "Concat_AAAAA_XX..." where the AAAAA part is a alphanumeric string followed by XX which is a double integer. Both act as keys for a specific table in SQL.
My question is what would be the most elegant way to create a Python Script that would take the AAAAA & XX values from each filename, and know which table to import that data into?
CSV1 named: Concat_T101_14_20072021.csv
would need to be imported into Table A
CSV2 named: Concat_RB728_06_25072021.csv
would need to be imported into Table B
CSV3 named: Concat_T144_21_27072021.csv
would need to be imported into Table C
and so on...
I've read up that the ConfigParser package may be able to help, but cannot understand how to apply its theory here. The reason for suggesting ConfigParser is because I'd like to have the flexibility or editing a config file (eg "CONFIG.INI") rather than having to hard-code new entries into the python script.
The code I have so far works for just a standalone dataset, which can be found here.
Here is the code I'm using:
import pypyodbc as odbc
import pandas as pd
import os
os.chdir('SQL Loader')
df = pd.read_csv('Real-Time_Traffic_Incident_Reports.csv')
df['Published Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
df['Status Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
df.drop(df.query('Location.isnull() | Status.isnull()').index, inplace=True)
columns = ['Traffic Report ID', 'Published Date', 'Issue Reported', 'Location',
'Address', 'Status', 'Status Date']
df_data = df[columns]
records = df_data.values.tolist()
DRIVER = 'SQL Server'
SERVER_NAME = 'MY SERVER'
DATABASE_NAME = 'MYDATABASE'
def connection_string(driver, server_name, database_name):
conn_string = f"""
DRIVER={{{driver}}};
SERVER={server_name};
DATABASE={database_name};
Trust_Connection=yes;
"""
return conn_string
try:
conn = odbc.connect(connection_string(DRIVER, SERVER_NAME, DATABASE_NAME))
except odbc.DatabaseError as e:
print('Database Error:')
print(str(e.value[1]))
except odbc.Error as e:
print('Connection Error:')
print(str(e.value[1]))
sql_insert = '''
INSERT INTO Austin_Traffic_Incident
VALUES (?, ?, ?, ?, ?, ?, ?, GETDATE())
'''
try:
cursor = conn.cursor()
cursor.executemany(sql_insert, records)
cursor.commit();
except Exception as e:
cursor.rollback()
print(str(e[1]))
finally:
print('Task is complete.')
cursor.close()
conn.close()
You can do a translation table using a dict
like
import re
from glob import glob
translation_table = {
'14': 'A',
'06': 'B',
'21': 'C'
}
# get all csv files from current directory
for filename in glob("*.csv"):
# extract the file number with a regular expression
# (can also be done easily with split function)
filenum = re.match(r"^Concat_([0-9]+)_[0-9]{8}.csv$", filename).group(1)
# use the translation table to get the table name
tablename = translation_table[filenum]
print(f"Data from file '{filename}' goes to table '{tablename}'")