I want the user to be able to choose what order results are displayed e.g. by age), and I don't want to sort them after getting them from the database.
Obviously if the user is able to specify input that affects SQL commands, it needs to be sanitised, and I would normally use parameterisation, but pysqlite seems to ignore parameters for anything except values.
Example code is below showing parameterisation not working for ORDER BY
, and also a workaround using string formatting, but that is vulnerable to SQL injection.
What is the recommended solution to allow user input to affect sort order without exposing SQLi vulnerabilities? Do I have to use string formatting and check every user input manually?
#!/user/bin/env python3
import sqlite3
con = sqlite3.connect(':memory:')
cur = con.cursor()
cur.execute('CREATE TABLE test (name, age)')
cur.execute('INSERT INTO test VALUES (:name, :age)', {'name': 'Aaron', 'age': 75})
cur.execute('INSERT INTO test VALUES (:name, :age)', {'name': 'Zebedee', 'age': 5})
cur.execute('SELECT * FROM test ORDER BY age ASC')
results = cur.fetchall()
print('\nGood, but hard coded:\n', results)
# Good, but hard coded:
# [('Zebedee', 5), ('Aaron', 75)]
cur.execute('SELECT * FROM test ORDER BY :order_by ASC', {'order_by': 'age'})
results = cur.fetchall()
print('\norder_by parameter ignored:\n', results)
# order_by parameter ignored:
# [('Aaron', 75), ('Zebedee', 5)]
cur.execute('SELECT * FROM test ORDER BY {order_by} ASC'.format(order_by='age'))
results = cur.fetchall()
print('\nRight order, but vulnerable to SQL injection:\n', results)
# Right order, but vulnerable to SQL injection:
# [('Zebedee', 5), ('Aaron', 75)]
con.close()
SQL parameters are used only for values; anything else could change the meaning of the query. (For example, ORDER BY password
could leave hints, as could ORDER BY (SELECT ... FROM OtherTable ...)
.)
To ensure that the column name from the client is valid, you could use a whitelist:
if order_by not in ['name', 'age']:
raise ...
execute('... ORDER BY {}'.format(order_by))
But it is still a bad idea to integrate that string into the query, because the validation and the actual table could go out of sync, or you could forget the check. Better return a column index from the client, so that the actual string you're using is always your own, and any errors can be easily found during normal testing:
order_by = ['name', 'age'][order_index]
execute('... ORDER BY {}'.format(order_by))