pythonsql-serverpandasconfigparserpypyodbc

Import CSVs into different SQL tables


I have a directory full of CSVs that need to be imported into different tables of a SQL Server database. Fortunately the filename of the appended CSVs starts with the string "Concat_AAAAA_XX..." where the AAAAA part is a alphanumeric string followed by XX which is a double integer. Both act as keys for a specific table in SQL.

My question is what would be the most elegant way to create a Python Script that would take the AAAAA & XX values from each filename, and know which table to import that data into?

CSV1 named: Concat_T101_14_20072021.csv
would need to be imported into Table A

CSV2 named: Concat_RB728_06_25072021.csv
would need to be imported into Table B

CSV3 named: Concat_T144_21_27072021.csv
would need to be imported into Table C

and so on...

I've read up that the ConfigParser package may be able to help, but cannot understand how to apply its theory here. The reason for suggesting ConfigParser is because I'd like to have the flexibility or editing a config file (eg "CONFIG.INI") rather than having to hard-code new entries into the python script.

The code I have so far works for just a standalone dataset, which can be found here.

Here is the code I'm using:

import pypyodbc as odbc
import pandas as pd 
import os

os.chdir('SQL Loader')
df = pd.read_csv('Real-Time_Traffic_Incident_Reports.csv')

df['Published Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')
df['Status Date'] = pd.to_datetime(df['Published Date']).dt.strftime('%Y-%m-%d %H:%M:%S')

df.drop(df.query('Location.isnull() | Status.isnull()').index, inplace=True)

columns = ['Traffic Report ID', 'Published Date', 'Issue Reported', 'Location', 
            'Address', 'Status', 'Status Date']

df_data = df[columns]
records = df_data.values.tolist()

DRIVER = 'SQL Server'
SERVER_NAME = 'MY SERVER'
DATABASE_NAME = 'MYDATABASE'

def connection_string(driver, server_name, database_name):
    conn_string = f"""
        DRIVER={{{driver}}};
        SERVER={server_name};
        DATABASE={database_name};
        Trust_Connection=yes;        
    """
    return conn_string

try:
    conn = odbc.connect(connection_string(DRIVER, SERVER_NAME, DATABASE_NAME))
except odbc.DatabaseError as e:
    print('Database Error:')    
    print(str(e.value[1]))
except odbc.Error as e:
    print('Connection Error:')
    print(str(e.value[1]))


sql_insert = '''
    INSERT INTO Austin_Traffic_Incident 
    VALUES (?, ?, ?, ?, ?, ?, ?, GETDATE())
'''

try:
    cursor = conn.cursor()
    cursor.executemany(sql_insert, records)
    cursor.commit();    
except Exception as e:
    cursor.rollback()
    print(str(e[1]))
finally:
    print('Task is complete.')
    cursor.close()
    conn.close()

Solution

  • You can do a translation table using a dict like

    import re
    from glob import glob
    
    translation_table = {
        '14': 'A', 
        '06': 'B',
        '21': 'C'
        }
    
    # get all csv files from current directory
    for filename in glob("*.csv"):
    
        # extract the file number with a regular expression
        # (can also be done easily with split function)
        filenum = re.match(r"^Concat_([0-9]+)_[0-9]{8}.csv$", filename).group(1)
    
        # use the translation table to get the table name
        tablename = translation_table[filenum]
        
        print(f"Data from file '{filename}' goes to table '{tablename}'")