postgresqllocalecollationicupostgresql-11

postgresql 11 - create database with ICU locale


I want to use an ICU system-insensitive sorting collation, to avoid sorting differences between postgres11-on-mac vs postgres11-on-Ubuntu. My first test was to dump out my existing Collate=en_US.UTF-8 and pg_restore them into a db created with Collate=en-US-x-icu

Create Database doc has this to say:

To create a database music with a different locale:

CREATE DATABASE music LC_COLLATE 'sv_SE.utf8' LC_CTYPE 'sv_SE.utf8' TEMPLATE template0;

I seem to have the required icu locales already:

select collname, collprovider from pg_collation where collname like 'en_US%';

        collname        | collprovider
------------------------+--------------
 en_US.UTF-8            | c
 en_US                  | c
 en_US.ISO8859-15       | c
 en_US.ISO8859-1        | c
 en_US                  | c
 en_US                  | c
 en-US-x-icu            | i ๐Ÿ‘ˆ
 en-US-u-va-posix-x-icu | i ๐Ÿ‘ˆ
(8 rows)

But no luck when creating a database with either icu locales.

CREATE DATABASE test LC_COLLATE = 'en-US-x-icu' TEMPLATE template0;
ksysdb=# CREATE DATABASE test LC_COLLATE = 'en-US-x-icu' TEMPLATE template0;
ERROR:  invalid locale name: "en-US-x-icu"

I can use LC_COLLATE with other locales:

The LC_COLLATE clause does seem to come with some strings attached, such as watching your encoding and specifying an appropriate template. But it seems to give error hints w non-ICU locales.

This works, for example: CREATE DATABASE test LC_COLLATE = 'en_US' TEMPLATE template0;

and this one gives a helpful user message:

ksysdb=# CREATE DATABASE test LC_COLLATE = 'en_US.ISO8859-15' TEMPLATE template0;
ERROR:  encoding "UTF8" does not match locale "en_US.ISO8859-15"
DETAIL:  The chosen LC_COLLATE setting requires encoding "LATIN9".

Note: a related question, PostgreSQL 10 on Linux - LC_COLLATE locale en_US.utf-8 not valid, doesn't seem all that relevant, as the answer talks about generating an OS-level locale to fix the issue. While the ICU locales, as far as I understand, are expressly intended to be separated from the underlying OS.


Solution

  • This is possible from PostgreSQL v15 on:

    CREATE DATABASE test
       LOCALE_PROVIDER icu
       ICU_LOCALE "en-US"
       LOCALE "en_US.utf8"
       TEMPLATE template0;
    

    You still need to provide the libc locale, but the ICU library is used for collations.