I have a project that requires we allow users to create custom columns, enter custom values, and use these custom values to execute user defined functions.
Similar Functionality In Google Data Studio
We have exhausted all implementation strategies we can think of (executing formulas on the front end, in isolated execution environments, etc.).
Short of writing our own interpreter, the only implementation we could find that meets the performance, functionality, and scalability requirements is to execute these functions directly within MySQL. So basically taking the expressions that have been entered by the user, and dynamically rolling up a query that computes results server side in MySQL.
This obviously opens a can of worms security wise.
Quick aside: I expect to get the "you shouldn't do it that way" response. Trust me, I hate that this is the best solution we can find. The resources online describing similar problems is remarkably scarce, so if there are any suggestions for where to find information on analogous problems/solutions/implementations, I would greatly appreciate it.
With that said, assuming that we don't have alternatives, my question is: How do we go about doing this safely?
We have a few current safeguards set up:
This gets us pretty far. But I'm still not comfortable with it. One additional option that I couldn't find any info online about was using the query execution plan as a means of detecting if the query is going outside of its bounds.
So prior to actually executing the query/getting the results, you would wrap it within an EXPLAIN statement to see what the dynamic query was doing. From the results of the EXPLAIN query, you should able to detect any operations (subqueries, key references, UNIONs, etc.) that fall outside of the bounds of what the query is allowed to do.
Is this a useful validation method? It seems to me that this would be a powerful tool for protecting against a suite of SQL injections, but I couldn't seem to find any information online.
Thanks in advance!
(from Comment)
Some Examples showing the actual autogenerated queries being used. There are both visual and list examples showing the query execution plan for both malicious and valid custom functions.
GRANT
only SELECT
on the table(s) that they are allowed to manipulate. This allows arbitrarily complex SELECT
queries to be run. (The one flaw: Such queries may run for a long time and/or take a lot of resources. MariaDB has more facilities for preventing run-away selects.)
Provide limited "write" access via Stored Routines with expanded privileges, but do not pass arbitrary values into them. See SQL SECURITY
: DEFINER
has the privileges of the person creating the routine. (As opposed to INVOKER
is limited to SELECT
on the tables mentioned above.)
Another technique that may or may not be useful is creating VIEWs
with select privileges. This, for example, can let the user see most information about employees while hiding the salaries.
Related to that is the ability to GRANT
different permissions on different columns, even in the same table.
(I have implemented a similar web app, and released it to everyone in the company. And I could 'sleep at night'.)
I don't see subqueries and Unions as issues. I don't see the utility of EXPLAIN
other than to provide more info in case the user is a programmer trying out queries.
EXPLAIN
can help in discovering long-running queries, but it is imperfect. Ditto for LIMIT
.
More
I think "UDF" is either "normalization" or "EAV"; it is hard to tell which. Please provide SHOW CREATE TABLE
.
This is inefficient because it builds a temp table before removing the 'NULL' items:
FROM ( SELECT ...
FROM ...
LEFT JOIN ...
) AS context
WHERE ... IS NULL
This is better because it can do the filtering sooner:
FROM ( SELECT ...
FROM ...
LEFT JOIN ...
WHERE ... IS NULL
) AS context