Official spelling reforms in Scandinavian languages in the 19th and 20th centuries replaced digraphs (two-letter combinations) with single, distinct letters:
In a search context, users absolutely expect these forms to be treated as equivalent. However, in the JavaScript Intl API, only å = aa is treated as equal. The others (æ = ae, ø = oe, ä = ae, ö = oe) are not.
Is this a known limitation of the JavaScript Intl API or the underlying ICU implementation, or am I missing a configuration option?
The code snippet below demonstrates the problem. A result of 0 indicates the collator treats the two strings as equivalent, which is what we expect for all these cases.
const options = { usage: 'search', sensitivity: 'base' };
const daCollator = new Intl.Collator('da', options);
const svCollator = new Intl.Collator('sv', options);
const results = [
daCollator.compare('å', 'aa'), // 0 ✅ expected
daCollator.compare('æ', 'ae'), // 1 ❌ unexpected
daCollator.compare('ø', 'oe'), // 1 ❌ unexpected
svCollator.compare('ä', 'ae'), // 1 ❌ unexpected
svCollator.compare('ö', 'oe'), // 1 ❌ unexpected
];
const expected = [0, 0, 0, 0, 0];
console.log('Results:', results);
console.log('Expected:', expected);
References
The support for å = aa in Danish is clearly documented in the UCA Specification:
For example, at a primary strength, "ß" would match against "ss" according to the UCA, and "aa" would match "å" in a Danish tailoring of the UCA.
It is also documented in the ICU User Guide:
For example, in Danish, ‘å’ (\u00e5) and ‘aa’ are considered equivalent.
å = aa is historically the most entrenched and universally accepted equivalence. The CLDR maintainers are likely conservative and that may explain why the other equivalences are not implemented.