postgresqlazurecollationicu

How does postgres know which collation to use during database creation if there are two with the same locale?


I have a Postgres DB created on Azure (probably not very important, added for context). That DB has predefined set of collations. All deterministic.

Among them there is a collation with name en-GB-x-icu with locale en-GB.

If I now create a new (non-deterministic) collation like this:

create collation en_gb_case_insensitive (
    provider = icu,
    locale = 'en-GB',
    deterministic = false
);

and then run query:

select 
collname,
collprovider,
collisdeterministic,
colliculocale,
collicurules
from 
pg_collation 
where 
collprovider = 'i' and
colliculocale like '%en-GB%'  

I will be presented with below result:

collname collprovider collisdeterministic colliculocale collicurules
en_gb_case_insensitive i false en-GB
en-GB-x-icu i true en-GB

If now I create database with below statement:

create database db_test002 
LOCALE_PROVIDER icu ICU_LOCALE 'en-GB' LOCALE 'en_GB.utf8' 
TEMPLATE template0;

Main question: Which collation will be used?

Support questions:

  1. Will the new DB collation be deterministic or not?
  2. How can I precisely define which collation I want to use?
  3. Does it make sense to create non-deterministic collation with language tag being just en-GB?
select datname, 
       datcollate,
       datctype,
       daticulocale,
       datcollversion
from pg_database
where datname = 'db_test002';
datname datcollate datctype daticulocale datcollversion
db_test002 en_GB.utf8 en_GB.utf8 en-GB 153.14

EDIT: My realizations:

  1. Creating new collation before creating new DB makes no sense :)
  2. Postgres will use the ICU_LOCALE to find "best matching" locale from existing ones
  3. Default collation will be deterministic always

So I can have databases like this

datname datlocprovider daticulocale
db_test001 i en-GB
db_test003 i en_gb_case_ignore
db_test004 i en-GB-u-ks-level2
db_test006 i en-GB-x-icu
  1. I can create undeterministic collation after db creatrion, with locale matching specific db locale but it will change noting :(

Solution

  • This is confusing. The collations you define in the database (and that are stored in the pg_collation catalog) are independent of the collations that are used in CREATE DATABASE. After all, the pg_collation catalog table of the new database is copied from the template, not from the database where you are currently connected.

    A collation used in CREATE DATABASE is always deterministic, as PostgreSQL does not yet support non-deterministic database collations. PostgreSQL will choose the deterministic ICU (or C library) collation that matches the string you specify in the CREATE DATABASE statement.

    To illustrate what I mean, look at this:

    CREATE COLLATION mycoll (
       PROVIDER = icu,
       LOCALE = 'en-GB-u-ks-level2',
       DETERMINISTIC = FALSE
    );
    
    CREATE DATABASE newdb
       TEMPLATE template0
       LOCALE_PROVIDER icu
       ICU_LOCALE mycoll
       LOCALE "en_GB.utf8";
    WARNING:  ICU locale "mycoll" has unknown language "mycoll"
    HINT:  To disable ICU locale validation, set the parameter "icu_validation_level" to "disabled".
    CREATE DATABASE
    

    mycoll is taken as a locale definition, but it matches no known language. The collation of the new database will be the deterministic "undefined" ICU collation.