pythonpython-3.xpyodbcxlrd

Importing data from an excel file using python into SQL Server


I have found some other questions that have a similar error to what I am getting, but have not been able to figure out how to resolve this based on the answers. I am trying to import an excel file into SQL Server with the help of python. This is the code I wrote:

import pandas as pd
import numpy as np
import pandas.io.sql
import pyodbc
import xlrd

server = "won't disclose private info"
db = 'private info'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + Server + ';DATABASE=' + 
db + ';Trusted_Connection=yes')

cursor = conn.cursor()
book = xlrd.open_workbook("Daily Flash.xlsx")
sheet = book.sheet_by_name("Sheet1")

query1 = """CREATE TABLE [LEAF].[MK] ([LEAF][Lease_Number] varchar(255), 
[LEAF][Start_Date] varchar(255), [LEAF][Report_Status] varchar(255), [LEAF] 
[Status_Date] varchar(255), [LEAF][Current_Status] varchar(255), [LEAF] 
[Sales_Rep] varchar(255), [LEAF][Customer_Name] varchar(255),[LEAF] 
[Total_Finance] varchar(255),
[LEAF][Rate_Class] varchar(255) ,[LEAF][Supplier_Name] varchar(255) ,[LEAF] 
[DecisionStatus] varchar(255))"""


query = """INSERT INTO [LEAF].[MK] (Lease_Number, Start_Date, Report_Status, 
Status_Date, Current_Status, Sales_Rep, Customer_Name,Total_Finance,
Rate_Class,Supplier_Name,DecisionStatus) VALUES (%s, %s, %s, %s, %s, %s, %s, 
%s, %s, %s, %s)"""

for r in range(1, sheet.nrows):
    Lease_Number  = sheet.cell(r,0).value
    Start_Date    = sheet.cell(r,1).value
    Report_Status = sheet.cell(r,2).value
    Status_Date   = sheet.cell(r,3).value
    Current_Status= sheet.cell(r,4).value
    Sales_Rep     = sheet.cell(r,5).value
    Customer_Name = sheet.cell(r,6).value
    Total_Financed= sheet.cell(r,7).value
    Rate_Class    = sheet.cell(r,8).value
    Supplier_Name = sheet.cell(r,9).value
    DecisionStatus= sheet.cell(r,10).value


    values = (Lease_Number, Start_Date, Report_Status, Status_Date, 
    Current_Status, Sales_Rep, Customer_Name, Total_Financed, Rate_Class, 
    Supplier_Name, DecisionStatus)

    cursor.execute(query1)

    cursor.execute(query, values)


database.commit()


database.close()


database.commit()

The error message I get is:

ProgrammingError                          Traceback (most recent call last)
<ipython-input-24-c525ebf0af73> in <module>()
 16 
 17     # Execute sql Query
 ---> 18     cursor.execute(query, values)
 19 
 20 # Commit the transaction

 ProgrammingError: ('The SQL contains 0 parameter markers, but 11 parameters 
 were supplied', 'HY000')

Can someone please explain the problem to me and how I can fix it? Thank you!

Update:

I have gotten that error message to go away based on the comments below. I modified my query also because the table into which I am trying to insert values into was not previously created, so I updated my code in an attempt to create it.

However, now I am getting the error message:

ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL 
Server]The specified schema name "dbo" either does not exist or you do not 
have permission to use it. (2760) (SQLExecDirectW)')

I tried changing that slightly by writing CREATE [HELLO][MK] instead of just CREATE MK but that tells me that MK is already in the database... What steps should I take next?


Solution

  • Based on the conversation we had in our chat, here are a few takeaways:

    1. After executing your CREATE TABLE query, make sure to commit it immediately before running any subsequent INSERT queries.
    2. Use error catching for cases when the table already exists in the database. You asked that if you wanted to import more data to the table, would the script still run. The answer is no, since Python will throw an exception at cursor.execute(query1).
    3. If you want to validate whether your insert operations were successful, you can do a simple record count check.

    EDIT Yesterday, when I had @mkheifetz test my code out, he caught a minor bug where the validation check would return False, and the reason was because the database already had existing records, so when comparing against only the current data being imported, the validation would fail. Therefore, as a solution to address the bug, I have modified the code again.

    Below is how I would modify your code:

    import pandas as pd
    import numpy as np
    import seaborn as sns
    import scipy.stats as stats
    import matplotlib.pyplot as plt
    
    import pandas.io.sql
    import pyodbc
    
    import xlrd
    server = 'XXXXX'
    db = 'XXXXXdb'
    
    # create Connection and Cursor objects
    conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db + ';Trusted_Connection=yes')
    cursor = conn.cursor()
    
    # read data
    data = pd.read_excel('Flash Daily Apps through 070918.xls')
    
    # rename columns
    data = data.rename(columns={'Lease Number': 'Lease_Number',
                                'Start Date': 'Start_Date',
                                'Report Status': 'Report_Status',
                                'Status Date': 'Status_Date',
                                'Current Status': 'Current_Status',
                                'Sales Rep': 'Sales_Rep',
                                'Customer Name': 'Customer_Name',
                                'Total Financed': 'Total_Financed',
                                'Rate Class': 'Rate_Class',
                                'Supplier Name': 'Supplier_Name'})
    
    # export
    data.to_excel('Daily Flash.xlsx', index=False)
    
    # Open the workbook and define the worksheet
    book = xlrd.open_workbook("Daily Flash.xlsx")
    sheet = book.sheet_by_name("Sheet1")
    
    query1 = """
    CREATE TABLE [LEAF].[ZZZ] (
        Lease_Number varchar(255),
        Start_Date varchar(255),
        Report_Status varchar(255),
        Status_Date varchar(255),
        Current_Status varchar(255),
        Sales_Rep varchar(255),
        Customer_Name varchar(255),
        Total_Finance varchar(255),
        Rate_Class varchar(255),
        Supplier_Name varchar(255),
        DecisionStatus varchar(255)
    )"""
    
    query = """
    INSERT INTO [LEAF].[ZZZ] (
        Lease_Number,
        Start_Date,
        Report_Status,
        Status_Date,
        Current_Status,
        Sales_Rep,
        Customer_Name,
        Total_Finance,
        Rate_Class,
        Supplier_Name,
        DecisionStatus
    ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"""
    
    # execute create table
    try:
        cursor.execute(query1)
        conn.commit()
    except pyodbc.ProgrammingError:
        pass
    
    # grab existing row count in the database for validation later
    cursor.execute("SELECT count(*) FROM LEAF.ZZZ")
    before_import = cursor.fetchone()
    
    for r in range(1, sheet.nrows):
        Lease_Number = sheet.cell(r,0).value
        Start_Date = sheet.cell(r,1).value
        Report_Status = sheet.cell(r,2).value
        Status_Date = sheet.cell(r,3).value
        Current_Status= sheet.cell(r,4).value
        Sales_Rep = sheet.cell(r,5).value
        Customer_Name = sheet.cell(r,6).value
        Total_Financed= sheet.cell(r,7).value
        Rate_Class = sheet.cell(r,8).value
        Supplier_Name = sheet.cell(r,9).value
        DecisionStatus= sheet.cell(r,10).value
    
        # Assign values from each row
        values = (Lease_Number, Start_Date, Report_Status, Status_Date, Current_Status,
                  Sales_Rep, Customer_Name, Total_Financed, Rate_Class, Supplier_Name,
                  DecisionStatus)
    
        # Execute sql Query
        cursor.execute(query, values)
    
    # Commit the transaction
    conn.commit()
    
    # If you want to check if all rows are imported
    cursor.execute("SELECT count(*) FROM LEAF.ZZZ")
    result = cursor.fetchone()
    
    print((result[0] - before_import[0]) == len(data.index))  # should be True
    
    # Close the database connection
    conn.close()