pythonpython-3.xsortingpyicu

cmp_to_key is not working in python3 for .csv files


I'm working with .csv files, so I need to sort by specific column this answer doesn't work:

sorting with two key= arguments

thus using the idea from

How do I sort unicode strings alphabetically in Python?

we have

in python2

import icu # conda install -c conda-forge pyicu
collator = icu.Collator.createInstance(icu.Locale('el_GR.UTF-8'))
parts = [('3', 'ά', 'C'),
         ('6', 'γ', 'F'),
         ('5', 'β', 'E'),
         ('4', 'Ἀ', 'D'),
         ('2', 'Α', 'B'),
         ('1', 'α', 'A')]
foo = sorted(parts, key=lambda s: (s[1]), cmp=collator.compare)
for c in foo: 
  print c[0], c[1].decode('utf-8'), c[2]

with the correct result:

1 α A
2 Α B
4 Ἀ D
3 ά C
5 β E
6 γ F

but in python3

import icu # conda install -c conda-forge pyicu
from functools import cmp_to_key
collator = icu.Collator.createInstance(icu.Locale('el_GR.UTF-8'))
parts = [('3', 'ά', 'C'),
         ('6', 'γ', 'F'),
         ('5', 'β', 'E'),
         ('4', 'Ἀ', 'D'),
         ('2', 'Α', 'B'),
         ('1', 'α', 'A')]
foo = sorted(parts, key=lambda s: (s[1], collator.getSortKey))
#foo = sorted(parts, key=lambda s: (s[1], collator.compare))#the same result as collator.getSortKey
for c in foo: 
  print (c[0], c[1], c[2])

with wrong result:

2 Α B
1 α A
5 β E
6 γ F
4 Ἀ D
3 ά C

Solution

  • I think your calling sorted with the wrong key function.

    From docs.python.org:

    The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.

    Your key lambda returns a tuple containing the character and a function.

    python3 sorts tuples by the first item first, so "Α" is compared to "α" (byte order, not alphabetical), and if they are equal, collator.getSortKey is compared to collator.getSortKey.

    I think you want to use the following lambda, I belief it conveys what you want to happen.

    foo = sorted(parts, key=lambda s: collator.getSortKey(s[1]))
    

    This should sort alphabetical instead of with byte order.