pythoncollationicu

RuleBasedCollator rule ignored


I'm trying to use the icu RuleBasedCollator in python. In my code I specify a rule wherby "ä" should sort before "a" as a secondary (accent) difference

from icu import RuleBasedCollator

l=["a","ä"]
rbc = RuleBasedCollator('\n&ä<<a')
sorted(l, key=rbc.getSortKey)

However, the output of the sorted is:

['a', 'ä']

I expected: ['ä','a'] What did I do wrong?

Many thanks


Solution

  • It appears that the difference between a and ä is considered primary. Using [before 1] you can achieve the expected result.

    from icu import RuleBasedCollator
    
    l=["a","ä"]
    rbc = RuleBasedCollator('&[before 1]a < ä')
    print(sorted(l, key=rbc.getSortKey))
    

    To read more