I have the following piece of code where I read a csv file and connect to the database. then I want to pass two columns from CSV file as variable to my query and eventually convert the result to a pd database.
I have tried different ways of binding and converted columns to list but I was unsuccessful. with this piece of code I get the following Error:
DatabaseError: DPY-4010: a bind variable replacement value
for placeholder ":HOUR" was not provided
or I get below error when I add this part to execute():
res = connection.cursor().execute("SELECT HOUR,UNITSCHEDULEID,VERSIONID,MINRUNTIME FROM
int_Stg.UnitScheduleOfferHourly WHERE Hour in :1 AND UnitScheduleId in :2", hour, unitid)
TypeError: Cursor.execute() takes from 2 to 3 positional arguments but 4 were given
the following is the code I execute:
import pandas as pd
import numpy as np
df = pd.read_csv(r"csv.csv")
df=df.dropna()
unitid=df['UNITSCHEDULEID'].unique()
hour=df['GMT_DATETIME'].unique()
import os, oracledb, csv, pyodbc
TNS_Admin = os.environ.get('TNS_Admin', r'\\corp.xxx\oracle')
oracledb.defaults.config_dir = TNS_Admin
pw = input(f'Enter password: ')
connection = oracledb.connect(user='xxxxx', password= pw, dsn="World")
res = connection.cursor().execute("SELECT HOUR,UNITSCHEDULEID,VERSIONID,MINRUNTIME FROM
int_Stg.UnitScheduleOfferHourly WHERE Hour in :Hour AND UnitScheduleId in :unitid").fetchall()
print(res)
connection.close()
As the Python OracleDB documentation for Binding Multiple Values to a SQL WHERE IN
Clause states, you need to generate a statement with a bind variable for every value in the arrays and then pass in those values to the bind variables:
query = """SELECT HOUR, UNITSCHEDULEID, VERSIONID, MINRUNTIME
FROM int_Stg.UnitScheduleOfferHourly
WHERE Hour in ({hour_binds})
AND UnitScheduleId in ({id_binds})""".format(
hour_binds=",".join((f":{idx}" for idx, _ in enumerate(hours, 1))),
id_binds=",".join((f":{idx}" for idx, _ in enumerate(unitid, len(hours) + 1))),
)
res = connection.cursor().execute(query, (*hours, *unitid)).fetchall()
print(res)
If you have more than 1000 elements in either list then split the list up into multiple IN
clauses.
def generate_sql_in_binds(
name: str,
size: int,
start: int = 1,
max_binds: int = 1000,
) -> str:
in_clauses = (
"{name} IN ({binds})".format(
name=name,
binds=",".join(
(
f":{b+start}"
for b in range(i, min(i+max_binds,size))
)
)
)
for i in range(0, size, max_binds)
)
return "(" + (" OR ".join(in_clauses)) + ")"
query = """SELECT HOUR, UNITSCHEDULEID, VERSIONID, MINRUNTIME
FROM int_Stg.UnitScheduleOfferHourly
WHERE {hour_binds}
AND {id_binds}""".format(
hour_binds=generate_sql_in_binds("hour", len(hours)),
id_binds=generate_sql_in_binds("UnitScheduleId", len(unitid), len(hours) + 1),
)
res = connection.cursor().execute(query, (*hours, *unitid)).fetchall()
print(res)