pythonlistlevenshtein-distancecustom-listsmultiple-matches

How can I know the coincidences in 2 list in python. Order Matters but when 1 fails, the rest shouldnt fail or be 0 coincidence


I have 2 python lists to compare.

list1 = ['13.3. Risk', '13.3.1. Process', 'Change']
list2 = ['Change', '13.3. Risk', '13.3.1. Process']

I want to know how exact the order of elements is.

If I go item by item, the coincidence it´s 0 since the first one fails.

But if you look carefully, just fails the first element. And the rest are in order. So the coincidence, or better explained: accuracy/precision is 66.66%

I have tried 3 things:

Element by element

coincidences= [i == j for i, j in zip(list1, list2)] 
percentaje= 100 * sum(coincidences) / len(list1)

This results on 0% in this example.

Levenstein distance

I convert list to string with join and calculate levenstein distance

from Levenshtein import distance

str1 = ','.join(list1)
str2 = ','.join(list2)

lev_dist = distance(str1, str2)

percentaje= 100 * (1 - lev_dist / max(len(str1), len(str2)))

This results on 39.80582524271845%

Spearman Coef

from scipy.stats import spearmanr

pos_list1 = {elem: i for i, elem in enumerate(list1)}
range_list2 = [pos_list1 [elem] for elem in list2]

coef, p_valor = spearmanr(list(range(len(list1))), rango_lista2)
print(f'Spearman coef is: {coef}')      

This results on -0.5

So as you see, I dont get the expected 66.66% Is there another way of doing this?


Solution

  • May calculate Levenshtein distance between lists itself, not their concatenations:

    lev_dist = distance(list1, list2)
    
    percentage = 100 * (1 - lev_dist / (len(list1) + len(list2)))
    

    shows 66.66666666666666