pythonsqlpandaspyodbcpypyodbc

Is there a way with pandas to use read_sql() with an sql-statement, that takes multiple variables?


this is a part of my code. I already have a database with values in it, and need to create a new dataframe based on the values out of this dataframe (at the moment these values are only 12 & 13, stored in the header_row_ids variable).

When I run the script, I get the following error message:

pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * FROM dbo.TestDetails WHERE Id IN (?,?), (13, 12)': ('07002', '[07002] [Microsoft][ODBC SQL Server Driver]COUNT field incorrect or syntax error')

Can someone help me fix this issue? I would be very grateful.

import pypyodbc
import pandas

sourceConnection = pypyodbc.connect(
            self.sourceDriver +
            self.sourceServer +
            self.sourceDatabase
            )

placeholders = ",".join("?" * (len(self.header_row_ids)))
sql_source_detail_select = "SELECT * FROM dbo.TestDetails WHERE Id IN (%s)" % placeholders

header_row_ids = [12, 13]
header_row_ids_string = str(self.header_row_ids).strip('[]')

new_sql_source_detail_select = sql_source_detail_select + ", (" + header_row_ids_string + ")"

dataframe = pandas.read_sql(new_sql_source_detail_select, sourceConnection)

Solution

  • read_sql accepts a params argument that you can use to pass the parameter values:

    # header_row_ids = [12, 13]
    # sql = "SELECT * FROM dbo.TestDetails WHERE Id IN (?,?)"
    
    dataframe = pandas.read_sql(sql, sourceConnection, params=header_row_ids)