pythonpandasoracle-databasepython-oracledb

How can I pass a variable from CSV file to Oracle SQL query fetch in Python?


I have the following piece of code where I read a csv file and connect to the database. then I want to pass two columns from CSV file as variable to my query and eventually convert the result to a pd database.

I have tried different ways of binding and converted columns to list but I was unsuccessful. with this piece of code I get the following Error:

DatabaseError: DPY-4010: a bind variable replacement value  
for placeholder ":HOUR" was not provided

or I get below error when I add this part to execute():

res = connection.cursor().execute("SELECT HOUR,UNITSCHEDULEID,VERSIONID,MINRUNTIME FROM 
 int_Stg.UnitScheduleOfferHourly WHERE  Hour in :1 AND UnitScheduleId in :2", hour, unitid)
TypeError: Cursor.execute() takes from 2 to 3 positional arguments but 4 were given

the following is the code I execute:

import pandas as pd
import numpy as np

df = pd.read_csv(r"csv.csv") 

df=df.dropna()

unitid=df['UNITSCHEDULEID'].unique()
hour=df['GMT_DATETIME'].unique()

import os, oracledb, csv, pyodbc
TNS_Admin = os.environ.get('TNS_Admin', r'\\corp.xxx\oracle')
oracledb.defaults.config_dir = TNS_Admin
pw = input(f'Enter password: ')
connection = oracledb.connect(user='xxxxx', password= pw, dsn="World")

res = connection.cursor().execute("SELECT HOUR,UNITSCHEDULEID,VERSIONID,MINRUNTIME FROM 
 int_Stg.UnitScheduleOfferHourly WHERE  Hour in :Hour AND UnitScheduleId in :unitid").fetchall()
print(res)

connection.close()


Solution

  • As the Python OracleDB documentation for Binding Multiple Values to a SQL WHERE IN Clause states, you need to generate a statement with a bind variable for every value in the arrays and then pass in those values to the bind variables:

    query = """SELECT HOUR, UNITSCHEDULEID, VERSIONID, MINRUNTIME
    FROM  int_Stg.UnitScheduleOfferHourly
    WHERE Hour in ({hour_binds})
    AND   UnitScheduleId in ({id_binds})""".format(
      hour_binds=",".join((f":{idx}" for idx, _ in enumerate(hours, 1))),
      id_binds=",".join((f":{idx}" for idx, _ in enumerate(unitid, len(hours) + 1))),
    )
    
    res = connection.cursor().execute(query, (*hours, *unitid)).fetchall()
    print(res)
    

    If you have more than 1000 elements in either list then split the list up into multiple IN clauses.

    def generate_sql_in_binds(
        name: str,
        size: int,
        start: int = 1,
        max_binds: int = 1000,
    ) -> str:
        in_clauses = (
            "{name} IN ({binds})".format(
                name=name,
                binds=",".join(
                    (
                        f":{b+start}"
                        for b in range(i, min(i+max_binds,size))
                    )
                )
            )
            for i in range(0, size, max_binds)
        )
        return "(" + (" OR ".join(in_clauses)) + ")"
    
    query = """SELECT HOUR, UNITSCHEDULEID, VERSIONID, MINRUNTIME
    FROM  int_Stg.UnitScheduleOfferHourly
    WHERE {hour_binds}
    AND   {id_binds}""".format(
      hour_binds=generate_sql_in_binds("hour", len(hours)),
      id_binds=generate_sql_in_binds("UnitScheduleId", len(unitid), len(hours) + 1),
    )
    
    res = connection.cursor().execute(query, (*hours, *unitid)).fetchall()
    print(res)