I am using SQL Server, pandas, and pyodbc.
I am trying to make a function that can take a column name, table name, another column name (for a GUID-type column), and a GUID value and make an SQL call like SELECT columnName1 FROM tableName WHERE columnName2 = guidValue
.
This function was working just fine when I wasn't worrying about SQL injection and I was just manually sticking square brackets where needed:
def get_sql_entity_set_data(entity_set_name, filter_expression=None, query_params='*'):
connString = get_connection_string()
conn = pyodbc.connect(connString)
query = "SELECT " + query_params + " FROM [" + entity_set_name + "]"
if filter_expression:
query += " WHERE " + filter_expression
logger.debug(f'Attempting the following SQL command: {query}.')
retValue = pd.read_sql(query, conn)
returnCount = len(retValue.index)
if returnCount == 0:
logger.warning('Fetch failed: no data retrieved.')
else:
logger.info(f'{returnCount} row(s) retrieved.')
return retValue
(When I was doing it that way, I would pass the full WHERE
clause as a parameter - e.g., "[Name] = 'Bob Smith'"
.)
But now I am trying to protect against SQL injection and having a devil of a time. Nothing I try works. I have learned that you can't parameterize table names. I also learned that you can't pass the asterisk '*'
as a parameter - e.g., SELECT ? FROM...
, so my function got a lot uglier with if
logic to handle whether the SELECT
column name was specified in the parameters or it was just defaulted to '*'
.
I was under the impression that I would be able to work around the can't-parameterize-table-name issue by making a prepared statement using EXEC
, so I have tried various prepared statements build up from this basic idea (just trying to come up with a simplified example off the top of my head)...
DECLARE @table nvarchar(128);
DECLARE @query nvarchar(max);
SET @table = ?;
SET @query = 'SELECT * FROM @table';
EXEC @query;
...but that didn't work.
At this point, I have given up on that approach and just put the table name right in the connection string as before. The only thing I'm doing is using another function to wrap it in square brackets. This is the state of my code right now, and it's still not working (no error, it just returns 0 results, even though I have confirmed on SSMS that the parameters I'm feeding it should get 1 result):
def get_data_with_guid_filter(entity_set_name, filter_col, filter_val):
connString = get_connection_string()
conn = pyodbc.connect(connString)
logger.debug(f'Attempting to fetch all data from {entity_set_name} where {filter_col} = {filter_val}.')
entity_set_name = paramaterize(entity_set_name, "sysid")
filter_col = paramaterize(filter_col, "sysid")
filter_val = paramaterize(filter_val, "guid")
params_tuple = (filter_col,) + (filter_val,)
query = f"SELECT * FROM {entity_set_name} WHERE ? = ?"
retValue = pd.read_sql(query, conn, params=params_tuple)
returnCount = len(retValue.index)
if returnCount == 0:
logger.warning('Fetch failed: no data retrieved.')
else:
logger.info(f'{returnCount} row(s) retrieved.')
return retValue
def paramaterize(param, param_type):
if param_type == "guid":
return f"(SELECT CONVERT(uniqueidentifier, '{param}'))"
elif param_type == "string":
return f"'{param}'"
elif param_type == "sysid":
return f"[{param}]"
else:
return param
Let's say the parameters are 'mytable', 'ID', and '123456' [not literally - just pretend that's a real GUID]. I can go into SSMS right now, enter SELECT * FROM [mytable] WHERE [ID] = (SELECT CONVERT(uniqueidentifier, '123456'))
, and get 1 result. But in Python, I get zero results and no error.
What's extra weird is I still get 0 results and no error even if I do either of these:
filter_val = paramaterize(filter_val, "guid")
to filter_val = paramaterize(filter_val, "string")
(this is weird to me, because when I was trying the EXEC
approach, I got an error that I understood to mean I could not filter a GUID column by an nvarchar value in a prepared statement!)filter_val = paramaterize(filter_val, "guid")
, even though SSMS seems to require those bracketsThe comments and a bit more digging helped guide me to the solution I was looking for. This allows me to have a function that dynamically makes an SQL call given table name, column name, and GUID parameters while also (AFAIK) providing security against SQL injection:
def get_sql_entity_set_data(entity_set_name, filter_col=None, filter_val=None):
connString = get_connection_string()
conn = pyodbc.connect(connString)
entity_set_name = paramaterize(entity_set_name)
if filter_col is not None and filter_val is not None:
filter_col = paramaterize(filter_col)
query = f"""
SET NOCOUNT ON;
DECLARE @table nvarchar(128) = {entity_set_name};
DECLARE @whereColumn nvarchar(128) = {filter_col};
DECLARE @query nvarchar(max) = 'SELECT * FROM ' + QUOTENAME(@table) + ' WHERE ' + QUOTENAME(@whereColumn) + ' = ' + ?;
EXEC sp_executesql @query
"""
logger.debug(f'Attempting the following SQL command: {query}.')
retValue = pd.read_sql(query, conn, params=(paramaterize(filter_val),))
else:
query = f"""
SET NOCOUNT ON;
DECLARE @table nvarchar(128) = {entity_set_name};
DECLARE @query nvarchar(max) = 'SELECT * FROM ' + QUOTENAME(@table);
EXEC sp_executesql @query
"""
logger.debug(f'Attempting the following SQL command: {query}.')
retValue = pd.read_sql(query, conn)
returnCount = len(retValue.index)
if returnCount == 0:
logger.warning('Fetch failed: no data retrieved.')
else:
logger.info(f'{returnCount} row(s) retrieved.')
return retValue
def paramaterize(param):
return f"'{param}'"
Note: I learned that I couldn't use a ? for a column name either (just like I couldn't use one for a table name).