pythonpostgresqlsecuritypsycopg2sql-injection

Why does psycopg2 still allow SQL injection with dynamically constructed table names


I'm developing a multi-tenant Python web application where users need to query different database tables based on their client context. The challenge is constructing secure SQL queries when table names must be determined dynamically at runtime. The core problem is that while psycopg2's parameterized queries effectively prevent SQL injection for data values, they cannot be used for table or column names. When I attempt to parameterize a table name using standard placeholders, PostgreSQL throws a syntax error because identifiers (table/column names) cannot be parameterized in the same way as values. This leaves me in a situation where I must construct SQL strings dynamically, which traditionally opens the door to SQL injection attacks. For example, if an attacker can control the table name input, they could potentially inject malicious SQL code that gets executed alongside the intended query.

I've discovered psycopg2's sql.Identifier() function which claims to safely handle dynamic identifiers, but I need to understand its limitations and whether additional validation layers are necessary. The application handles sensitive financial data across multiple clients, so security is paramount, but I also need to maintain good performance and code maintainability.

import psycopg2

def get_user_data(connection, table_name, user_id):
    query = f"SELECT * FROM {table_name} WHERE user_id = %s"
    cursor = connection.cursor()
    cursor.execute(query, (user_id,))
    return cursor.fetchall()

#See an example of the security risk:
malicious_table = "users; DROP TABLE users; --"
get_user_data(conn, malicious_table, 123)  #This would execute the DROP command 
I have tried doing the direct parameterization:

cursor.execute("SELECT * FROM %s WHERE user_id = %s", (table_name, user_id))

I expected the query to execute safely with the table name properly escaped.

But then the actual result: 
psycopg2.errors.SyntaxError: syntax error at or near "$1"

PostgreSQL doesn't allow parameterized identifiers.
I have tried F-string formatting, also tried using sql.Identifier as seen below:

from psycopg2 import sql  
query = sql.SQL("SELECT * FROM {} WHERE user_id = %s").format(sql.Identifier(table_name)) 
cursor.execute(query, (user_id,))

This works, but I am unsure about edge cases

The the whitelist + sql.Identifier() is seen below

ALLOWED_TABLES = {'client_1_transactions', 'client_2_transactions'}
if table_name not in ALLOWED_TABLES:
    raise ValueError(f"Invalid table: {table_name}")

I am thinking the whitelist + sql.Identifier() is an overkill, not sure though.
I would love to know the following:


Solution

  • Is psycopg2.sql.Identifier() truly safe against all forms of SQL injection for table/column names, or are there edge cases I should be aware of?

    Yes, it is safe. The issues you have to be aware of:

    The Identifier() method handles both of these cases, and that makes it safe to prevent SQL injection, if you use it consistently.

    If you don't use it consistently, or if you have other developers on your team who don't, then that's a problem. But no framework can help with bad developer habits. The framework enforces security like a toothbrush prevents cavities. It's up to you to use it consistently.

    What's the recommended approach for validating table names before using them in dynamic queries? Should I maintain a whitelist, or is there a more elegant solution?

    I use a whitelist. Some people call it an allowlist. That's the same thing.

    You could alternatively query the system views dynamically to validate table names, but this isn't more elegant, and it slows down your application. I've seen apps that do this, and they struggle with performance.

    Also the system views only tell you that a table exists by that name. It doesn't tell you that the table is valid for the specific query you want to run. You might have application-specific rules, for example only table names of a certain naming convention are valid for a given query. Using the allowlist solution, you can list a short subset of tables. Or you could validate using patterns instead of fixed table names.

    Are there any performance implications when using sql.SQL() and sql.Identifier() compared to regular parameterized queries?

    No. All of the sql.SQL() formatting, such as dynamic table name, happens in your application code before you send the query to PostgreSQL. The SQL engine parses and executes it exactly as if it had been a fixed query with no dynamic parts. How you built that query has no effect on how it is executed.