I've got a problem with sorting lists using unicode collation in Python 2.5.1 and 2.6.5 on OSX, as well as on Linux.
import locale
locale.setlocale(locale.LC_ALL, 'pl_PL.UTF-8')
print [i for i in sorted([u'a', u'z', u'ą'], cmp=locale.strcoll)]
Which should print:
[u'a', u'ą', u'z']
But instead prints out:
[u'a', u'z', u'ą']
Summing it up - it looks as if strcoll was broken. Tried it with various types of variables (fe. non-unicode encoded strings).
What do I do wrong?
Best regards, Tomasz Kopczuk.
Apparently, the only way for sorting to work on all platforms is to use the ICU library with PyICU bindings (PyICU on PyPI).
On OS X: sudo port install py26-pyicu
, minding bug described here: https://svn.macports.org/ticket/23429 (oh the joy of using macports).
PyICUs documentation is unfortunately severely lacking, but I managed to find out how it's done:
import PyICU
collator = PyICU.Collator.createInstance(PyICU.Locale('pl_PL.UTF-8'))
print [i for i in sorted([u'a', u'z', u'ą'], cmp=collator.compare)]
which gives:
[u'a', u'ą', u'z']
Another pro - @bobince: it's thread-safe, so not useless when setting request-wise locales.