pythonsortinglexicographiclexicographic-orderingcustom-sort

python: custom sort: not purely lexicographical but reverse and shortest common first


Background

I want to sort reverse but not strict lexicographical and then it gets even more weird.. :P

The reason is that a proprietary software parses directories exactly the way I describe here and I want to copy that behavior.

Requirements (in that order)

  1. both: python2 and python3 compatible
  2. Reverse lexicographical
  3. shortest common first

Example data

The following is an example of (random ordered) input data for that python script:

IA-test-PROD-me
ia-test-prod-me
ia-test-me-staging
ia-test-me
ia-test-STAGING-me
IA-test-me
IA-test-me-staging
ia-test-me-prod
IA-test-me-STAGING
IA-test-me-prod
IA-test-me-PROD
IA-test-STAGING-me

How it should look like

I store that in a list and need to sort it that it looks at the end like:

ia-test-me
ia-test-prod-me
ia-test-me-staging
ia-test-me-prod
ia-test-STAGING-me
IA-test-me
IA-test-me-staging
IA-test-me-prod
IA-test-me-STAGING
IA-test-me-PROD
IA-test-STAGING-me
IA-test-PROD-me

Code

From what I understood sort() and sorted() are stable funcs which sort lexicographically. But as I need to run all the above requirements I am stuck atm..

def sortLexo(input_list):
    words = input_list.split()
    words.sort(reverse=True)
 
    for i in words:
        print(i)

The problem is sort() + reverse=True alone is not enough as it does not fulfill the requirement 3 (shortest first) above:

           <-------------. should be placed here
ia-test-prod-me          |
ia-test-me-staging      /|\
ia-test-me-prod          |
ia-test-me    -------> wrong
ia-test-STAGING-me
           <--------------- should be placed here
IA-test-me-staging        |
IA-test-me-prod          /|\
IA-test-me-STAGING        |
IA-test-me-PROD           |
IA-test-me    --------> wrong
IA-test-STAGING-me
IA-test-PROD-me

I've played around with groupby to sort by length but I get nowhere (my python kl isn't that deep) .. :(

I guess it is super easy to do for someone with good python know how.. any help appreciated !


Solution

  • Trying to piece this together based on the description. It seems like you want to pad the right side of the comparison string with the highest character you expect to receive (I use the character 0xFF, but if you're using Unicode instead of ASCII you might need a higher number).

    MAX_LENGTH = max(len(word) for word in words)
    sorted(words, key=lambda word: word + "\xFF" * (MAX_LENGTH - len(word)), reverse=True)
    

    This will produce the following. Although it's different from your question, I can't understand what specification would produce the output in the question.

    ia-test-prod-me
    ia-test-me
    ia-test-me-staging
    ia-test-me-prod
    ia-test-STAGING-me
    IA-test-me
    IA-test-me-staging
    IA-test-me-prod
    IA-test-me-STAGING
    IA-test-me-PROD
    IA-test-STAGING-me
    IA-test-PROD-me
    

    What the code does is this: the key function created the key for comparison. In this case, we take the word and pad the right side of it with the highest character that we would expect to find in the string; that is the code "\xFF" * (MAX_LENGTH - len(word)). It might seem strange to use the multiplication operator on a string but it works and creates a string the length that you multiply it by; in this case the difference between the maximum string length and the length of the current string. In normal alphabetical sorting (like in the dictionary), words that are shorter come first in the sort order. Padding with the highest character makes strings that match until the end of the shorter string (like say ia-test-me and ia-test-me-staging) put the shorter string last (in this case first since we reverse the whole list with reverse=True).