Consider this simple Row Level Security example:
CREATE TABLE accounts (
id integer PRIMARY KEY,
email text,
password text,
);
-- creating RLS policy
ALTER TABLE accounts ENABLE ROW LEVEL SECURITY;
CREATE POLICY own_rows
ON accounts
FOR SELECT
USING (owner_id = current_setting('somenamespace.current_user'));
-- example query using RLS
set_config('somenamespace.current_user', 123, true) -- true for local scoped variable (lives till the end of tx)
SELECT email,password FROM accounts
PostgreSQL RLS is basically just the same as writing WHERE user_id = $1 clause at the end of each query that must access a private row. In practice that means on each query you must write something like set_config('') or SET LOCAL first, followed by the main query, all that wrapped in a transaction. This raises the following complexities:
tx.Begin + tx.Commit + tx.Rollback + set_config + core query vs just querySET command - commonly used in guides (!!!) on RLS while clearly being a security flaw if used like:-- declare policy
...
USING (owner_id = current_user)
-- execute query
SET current_user = 123
SELECT email,password FROM accounts
since SET command works per connection - once returned to connection pool SET leftovers remain in that connection, which may cause unrelated query/user to access data of previous user. Some manuals suggest mitigation by prepending SET LOCAL or RESET command which is even more peculiar (I bet I even saw that in Supabase doc):
RESET current_user
SET current_user = 123
...
However, it's still unclear if SET arbitrary_namespace.current_user = 123 remains in a connection, i.e. https://www.bytebase.com/blog/postgres-row-level-security-footguns/#10-connection-pooling-context-loss (note dot namespace for the variable)
-- Use application-controlled session variables
-- App sets per transaction:
SET app.user_id = 'user-uuid';
SET app.tenant_id = 'tenant-uuid';
App sets per transaction
NO IT IS NOT !!!! -> https://www.postgresql.org/docs/current/sql-set.html
Once the surrounding transaction is committed, the effects will persist until the end of the session, unless overridden by another SET.
Session AFAIU means connection, which means literally any consequent query executed by the client who got assigned that connection by the Connection Pool!
WHERE clause, I'll omit that part but it may be found on the link above tooSUMMARIZE:
get_user_auth_pair(user_id) returning email,password?PostgreSQL RLS is basically just the same as writing
WHERE user_id = $1clause at the end of each query
More complex, multi-statement queries, or things like insert..on conflict and merge can be much less trivial to properly lock down by strategically injecting where clauses. Plus, RLS extends to view definitions with a security_barrier enabled.
The doc offers an example how things can leak through a where that should otherwise hide things, that security_barrier prevents.
You're right that logically, it is exactly the same.
Functionally, it's arguably more convenient to use because it's all in one place, with some added features on top: you can switch specific policies on and off, turn the whole thing on or off, elevate to superuser to ignore all, choose what and who they apply to, and when. Emulating all that with plain wheres, you have to implement all of this yourself, which adds code to link and manipulate elements of your queries, which usually requires that you split them up into smaller, modular blocks, decreasing readability.
you must write something like
set_config('')orSET LOCALfirst
alter the setting at system, db or role level, once.sqlalchemy.events.PoolEvents, in node-postgres pg.Pool is an EventEmitter) and that can also include varying set role.set local and set_config('a.s','v',/*is_local->*/true) and force everything to use explicit transaction handling, just to have the end of transaction trigger its cleanup - you can instead leave that to the pool to handle on specific events.I agree interleaving all SQL code with constant setting updates and explicit transaction handling just to conform to RLS policies sounds like a headache. It's generally possible to set things up to minimise or completely hide away the sets and only do transaction handling when it's strictly necessary, leaving the rest to the pool config.
Given long enough queries, even if you did not delegate the sets and explicit transaction handling commands to the pool, they might easily turn out to contribute way less volume and noise to your code base compared to extending all your SQL with where clauses equivalent to your RLS policies - even before accounting for code that'll manage/enable/disable these elements.
Compare begin+set+base_query+commit+discard all versus base_query+added where in 2 CTEs+added where in 2 subqueries(+ the whole thing's split up into blocks to be detachable).
since
SETcommand works per connection - once returned to connection poolSETleftovers remain in that connection, which may cause unrelated query/user to access data of previous user.unclear if
SET arbitrary_namespace.current_user = 123remains in a connection
That's a pool/connection lifecycle issue, not an RLS one. The pool should handle the cleanup on its own. In addition to the earlier examples, in pgpool there's this
reset_query_list(string)
Specifies the SQL commands to be sent to reset the backend connection when exiting the user session.
In pgbouncer there's this:
server_reset_query
Query sent to server on connection release, before making it available to other clients.
And in both cases the default is discard all which includes reset all, undoing all previous sets in that connection. In SQLAlchemy, the reset-on-return scheme is configurable.
In the Bytebase post I think they only tried to suggest making the app set, and the RLS check only those - "App sets:" would suffice without confusing things by adding "per transaction". It could make more sense to say per operation (or any unit of work) because some pools offer statement-level pooling, in which case even set local would still affect other users of the pool. They might've assumed the reader will be using transaction-level pooling with automatic cleanup on checkout/checkin/release.
You can review examples from Supabase that use their auth.uid() extensively in RLS.
It introduces indexing/performance issues compared to simple
WHEREclause
Formulating good where clauses, or even debugging, improving, optimising DDL+DML are more common skills than those related to RLS, so I'd argue most of the performance issues are due to the fact these skills don't necessarily translate, not because RLS itself is slow. It's made to be transparent and no different from an equivalent where clause, performance wise.
There are some inherent losses due to security_barrier:
Views created with the
security_barriermay perform far worse than views created without this option. In general, there is no way to avoid this: the fastest possible plan must be rejected if it may compromise security. For this reason, this option is not enabled by default.
But that's just cost of something it would cost way more to emulate without the built-in feature.
Weighed against front-end boilerplate, doesn't sound that bad. Considering something, somewhere has to handle this logic, I don't think the db is such a bad candidate, especially since it's all already built-in and battle-tested.
SUMMARIZE:
- Considering the above, why even use RLS instead of simple
WHEREclause or an SQL function which takesuser_idas an argument? For the sake of "readability" and "explicit declarative permissions"? Apart from that, what real benefits it gives compared toget_user_auth_pair(user_id)returningemail,password?
These aren't mutually exclusive. It's common to write a function and plug it into the RLS policy. The difference is that you set up the policies once and maintain them in one place, while a "where policy" would have to be copied into and maintained in each query it's meant to affect. Also, RLS remains beyond users' reach - even if they somehow manage to disable the force-attached where or run entirely custom queries, they won't be able to shake off the RLS-supplied conditions.
- What am I missing in bytebase implementation? Because concerns related to
SETleftovers in a connection are definitely confirmed by multiple sources.
It should be handled by the pool or whatever manages the connection lifecycle, as mentioned earlier.