I have a Postgres DB created on Azure (probably not very important, added for context). That DB has predefined set of collations. All deterministic.
Among them there is a collation with name en-GB-x-icu
with locale en-GB
.
If I now create a new (non-deterministic) collation like this:
create collation en_gb_case_insensitive (
provider = icu,
locale = 'en-GB',
deterministic = false
);
and then run query:
select
collname,
collprovider,
collisdeterministic,
colliculocale,
collicurules
from
pg_collation
where
collprovider = 'i' and
colliculocale like '%en-GB%'
I will be presented with below result:
collname | collprovider | collisdeterministic | colliculocale | collicurules |
---|---|---|---|---|
en_gb_case_insensitive | i | false | en-GB | |
en-GB-x-icu | i | true | en-GB |
If now I create database with below statement:
create database db_test002
LOCALE_PROVIDER icu ICU_LOCALE 'en-GB' LOCALE 'en_GB.utf8'
TEMPLATE template0;
Main question: Which collation will be used?
Support questions:
en-GB
?select datname,
datcollate,
datctype,
daticulocale,
datcollversion
from pg_database
where datname = 'db_test002';
datname | datcollate | datctype | daticulocale | datcollversion |
---|---|---|---|---|
db_test002 | en_GB.utf8 | en_GB.utf8 | en-GB | 153.14 |
EDIT: My realizations:
So I can have databases like this
datname | datlocprovider | daticulocale |
---|---|---|
db_test001 | i | en-GB |
db_test003 | i | en_gb_case_ignore |
db_test004 | i | en-GB-u-ks-level2 |
db_test006 | i | en-GB-x-icu |
This is confusing. The collations you define in the database (and that are stored in the pg_collation
catalog) are independent of the collations that are used in CREATE DATABASE
. After all, the pg_collation
catalog table of the new database is copied from the template, not from the database where you are currently connected.
A collation used in CREATE DATABASE
is always deterministic, as PostgreSQL does not yet support non-deterministic database collations. PostgreSQL will choose the deterministic ICU (or C library) collation that matches the string you specify in the CREATE DATABASE
statement.
To illustrate what I mean, look at this:
CREATE COLLATION mycoll (
PROVIDER = icu,
LOCALE = 'en-GB-u-ks-level2',
DETERMINISTIC = FALSE
);
CREATE DATABASE newdb
TEMPLATE template0
LOCALE_PROVIDER icu
ICU_LOCALE mycoll
LOCALE "en_GB.utf8";
WARNING: ICU locale "mycoll" has unknown language "mycoll"
HINT: To disable ICU locale validation, set the parameter "icu_validation_level" to "disabled".
CREATE DATABASE
mycoll
is taken as a locale definition, but it matches no known language. The collation of the new database will be the deterministic "undefined" ICU collation.