pythonlistindexing

Unexpected list index affects word option for difflib.get_close_matches()


Question: I am required to use this indexing of tokens in my difflib call: difflib.get_close_matches(tokens[0], jobList, n=1, cutoff=0.85) in order to get my required output. If I use what I expect. Which is tokens[j] then my output is affected by having the token Asst still appearing before the address Wyndrum. Why?

# Short test removes job descriptions from in front of address trailing address strings
testList = ['21 Sharp Crescent _Wainuiomata Shop Asst','Shop Asst Wyndrum Avenue _Lower_Hutt Housewife','Housewife']
jobList = ['Asst','Housewife','Shop']

import difflib

newList = []
for i in range(len(testList)):
    tokens = testList[i].split()
    for j in range(len(tokens)):
        print("tokens[j]",tokens[j],"tokens[0]",tokens[0])
        result = difflib.get_close_matches(tokens[0], jobList, n=1, cutoff=0.85)
        if result:
            while tokens and tokens[0] == result[0]:
                    tokens.pop(0)               
        else:
            newString = ' '.join(tokens)
            newList.append(newString)
            break

for i in range(len(newList)):
    print(newList[i])

Expected/Correct Output

21 Sharp Crescent _Wainuiomata Shop Asst
Wyndrum Avenue _Lower_Hutt Housewife

Debug print lines

tokens[j] 21 tokens[0] 21
tokens[j] Shop tokens[0] Shop
tokens[j] Wyndrum tokens[0] Asst
tokens[j] _Lower_Hutt tokens[0] Wyndrum
tokens[j] Housewife tokens[0] Housewife

Solution

  • There is rule in Python: if you use for-loop to iterate some list then don't remove elements from this list - don't use remove() or pop(). Work on copy of original list or create new list with elements which you want to keep.

    When you remove element from list then other elements move on list - and they change indexes. And later for skips some element because it doesn't know that elements changed indexes.

    You should work on copy of tokens - tokens.copy()

    tokens = text.split()
    copy = tokens.copy()   # <-- create copy
    
    for j in range(len(copy)):  # <-- use copy
        print("tokens[j]", copy[j], "tokens[0]", tokens[0]) # <-- use copy
    

    Full working code with other changes:

    import difflib
    
    # PEP8: `lower_case_names` for variables
    test_list = [
        '21 Sharp Crescent _Wainuiomata Shop Asst',
        'Shop Asst Wyndrum Avenue _Lower_Hutt Housewife',
        'Housewife'
    ]
    job_list = ['Asst', 'Housewife', 'Shop']
    
    new_list = []  # PEP8: `lower_case_names` for variables
    
    for text in test_list:
        print(f'\n>>> text: {text} <<<\n')
        
        tokens = text.split()
        copy = tokens.copy()
        
        # loop copy of tokens.
        for j in range(len(copy)):
        #for j, tok in enumerate(tokens.copy()):
            print(f"tokens[j]: {copy[j]:10} | tokens[0]: {tokens[0]}")
            
            #result = difflib.get_close_matches(tokens[0], job_list, n=1, cutoff=0.85)
            result = difflib.get_close_matches(copy[j], job_list, n=1, cutoff=0.85)
            
            if result:
                if tokens[0] == result[0]:
                    print('   remove:', tokens.pop(0))
            else:
                break
    
        new_list.append(' '.join(tokens))
    
    print('\n--- results ---\n')
    
    for old, new in zip(test_list, new_list):
        print(old, '--->', new)
    

    Result:

    >> text: 21 Sharp Crescent _Wainuiomata Shop Asst <<<
    
    tokens[j]: 21         | tokens[0]: 21
    
    >>> text: Shop Asst Wyndrum Avenue _Lower_Hutt Housewife <<<
    
    tokens[j]: Shop       | tokens[0]: Shop
       remove: Shop
    tokens[j]: Asst       | tokens[0]: Asst
       remove: Asst
    tokens[j]: Wyndrum    | tokens[0]: Wyndrum
    
    >>> text: Housewife <<<
    
    tokens[j]: Housewife  | tokens[0]: Housewife
       remove: Housewife
    
    --- results ---
    
    21 Sharp Crescent _Wainuiomata Shop Asst ---> 21 Sharp Crescent _Wainuiomata Shop Asst
    Shop Asst Wyndrum Avenue _Lower_Hutt Housewife ---> Wyndrum Avenue _Lower_Hutt Housewife
    Housewife ---> 
    

    PEP 8 -- Style Guide for Python Code