pythonicu

How do I do a "natural sort" in PyICU?


Using PyICU, how can I use a Collator to sort a list of strings by "natural order", i.e., putting 10 after 2 instead of before?

In the ICU docs http://userguide.icu-project.org/collation/customization#TOC-Default-Options, I can see that there is a "numericOrdering" option (a.k.a. UCOL_NUMERIC_COLLATION) that can be set on or off, but I can't figure out how to set that attribute from Python code.


Solution

  • You can use the .setAttribute method on the Collator instance.

    The attribute name and value come from an enum that's attached to the main icu module:

    import icu
    
    collator = icu.Collator.createInstance(icu.Locale('en_US'))
    collator.setAttribute(icu.UCollAttribute.NUMERIC_COLLATION, icu.UCollAttributeValue.ON)
    
    sorted(['3 three', '1 one', '10 ten', '2 two'])
    # ['1 one', '10 ten', '2 two', '3 three']
    
    sorted(['3 three', '1 one', '10 ten', '2 two'], key=collator.getSortKey)
    # ['1 one', '2 two', '3 three', '10 ten']