pythonpandaslistordered-set

Finding an Intersection between two lists or dataframes while enforcing an ordering condition


I have two lists (columns from two separate pandas dataframes) and want to find the intersection of both lists while preserving the order, or ordering based on a condition. Consider the following example:

x = ['0 MO', '1 YR', '10 YR', '15 YR', '2 YR', '20 YR', '3 MO', '3 YR',
     '30 YR', '4 YR', '5 YR', '6 MO', '7 YR', '9 MO', 'Country']
y = ['Industry', '3 MO', '6 MO', '9 MO', '1 YR', '2 YR', '3 YR',
       '4 YR', '5 YR', '7 YR', '10 YR', '15 YR', '20 YR', '30 YR']

answer = set(x).intersection(y)

The variable answer yields the overlapping columns, yet the order is not preserved. Is there a way of sorting the solution such that the answer yields:

answer = ['3 MO', '6 MO', '9 MO', '1 YR', '2 YR', '3 YR',
          '4 YR', '5 YR', '7 YR', '10 YR', '15 YR', '20 YR',
          '30 YR']

i.e first sorting the intersected list by month ("MO") and integers, and then by year ("YR") and its integers?

Alternatively, is there a pandas method to obtain the same result with two dataframes of overlapping columns (preserving or stating order)?


Solution

  • You can use sorted function to sort answer by passing a custom function as a key. Since you want to sort first by whether it's MO or YR and then by the integer value, you can split on white space and evaluate by the second part (MO or YR) and then the integer value of the first part.

    def sorter(x):
        s = x.split()
        return (s[1],int(s[0]))
    
    out = sorted(set(x).intersection(y), key=sorter)
    

    Output:

    ['3 MO', '6 MO', '9 MO', '1 YR', '2 YR', '3 YR', '4 YR', '5 YR', '7 YR', '10 YR', '15 YR', '20 YR', '30 YR']