postgresqlconcurrencyddlcatalog

CREATE SCHEMA IF NOT EXISTS raises duplicate key error


To give some context, the command is issued inside a task, and many task might issue the same command from multiple workers at the same time.

Each tasks tries to create a postgres schema. I often get the following error:

IntegrityError: (IntegrityError) duplicate key value violates unique constraint "pg_namespace_nspname_index"
DETAIL:  Key (nspname)=(9621584361) already exists.
 'CREATE SCHEMA IF NOT EXISTS "9621584361"'

Postgres version is PostgreSQL 9.4rc1.
Is it a bug in Postgres?


Solution

  • This is a bit of a wart in the implementation of IF NOT EXISTS for tables and schemas. Basically, they're an upsert attempt, and PostgreSQL doesn't handle the race conditions cleanly. It's safe, but ugly.

    If the schema is being concurrently created in another session but isn't yet committed, then it both exists and does not exist, depending on who you are and how you look. It's not possible for other transactions to "see" the new schema in the system catalogs because it's uncommitted, so it's entry in pg_namespace is not visible to other transactions. So CREATE SCHEMA / CREATE TABLE tries to create it because, as far as it's concerned, the object doesn't exist.

    However, that inserts a row into a table with a unique constraint. Unique constraints must be able to see uncommitted rows in order to function. So the insert blocks (stops) until the first transaction that did the CREATE either commits or rolls back. If it commits, the second transaction aborts, because it tried to insert a row that violates a unique constraint. CREATE SCHEMA isn't smart enough to catch this case and re-try.

    To properly fix this PostgreSQL would probably need predicate locking, where it could lock the potential for a row. This might get added as part of the current work going on for implementing UPSERT.

    For these particular commands, PostgreSQL could probably do a dirty read of the system catalogs, where it can see uncommitted changes. Then it could wait for the uncommitted transaction to commit or roll back, re-do the dirty read to see if someone else is waiting, and retry. But this would have a race condition where someone else might create the schema between when you do the read to check for it and when you try to create it.

    So the IF NOT EXISTS variants would have to:

    As far as I know nobody's implemented that, or they tried and it wasn't accepted. There would be possible issues with transaction ID burn rate, etc, with this approach.

    I think this is a bug of sorts, but it's a "yeah, we know" kind of bug, not a "we'll get right on fixing that" kind of bug. Feel free to post to pgsql-bugs about it; at the very least the documentation should mention this caveat about IF NOT EXISTS.

    I don't recommend doing DDL concurrently like that.