pythonstringlistdata-manipulationsuffix

Manipulating items in a list, from a string then turning it back to a string


I applied to a data engineer job not too long ago, I got a Python question that didn’t meet all the edge cases and it had been haunting me since, I used .endswith() at that time and I feel like that’s what failed in my code

I have been trying to recode it and here is what I have so far:

x = 'cars that ran up and opened a 
tattooaged car dealership educated'
# create a program to remove 'ed' from 
# any word that ends with ed but not 
# the word 'opened'
# also, every word must be less than 
# 8 letters long

suffix= 'ed'

def check_ed_lt8(x):
    x_list=x.split(" ")
    for index,var in enumerate(x_list):
        if suffix in var != 'opened':
            new_word = var[:-len(suffix)].strip('suffix')
            x_list[index] = new_word
        elif len(var) >= 8:
            shorter_word = var[:8]
            x_list[index] = shorter_word
    return(' '.join(x_list))

print(check_ed_lt8(x))

I get the desired output:

cars that ran up and opened a tatooag car dealersh educat

But the technical question had examples before it, like some words ending in ‘ly’ and I started wondering if I maybe just had to loop through a list of suffixes, and that’s why I don’t pass the edge cases so I modified my code but now, every time I add on to the list, I lose manipulation over one of the last items in the list

suffixes = ['ed', 'an']
def check_ed_lt8(x):
    x_list=x.split(" ")
    for index,var in enumerate(x_list):
        for suffix in suffixes:
            if suffix in var != 'opened':
                new_word = var[:-len(suffix)].strip('suffix')
                x_list[index] = new_word
            elif len(var) >= 8:
                shorter_word = var[:8]
                x_list[index] = shorter_word
    return(' '.join(x_list))

print(check_ed_lt8(x))

Returns:

cars that r up a opened a tattoag car dealersh educated

In this return, I lost manipulation over the last item AND I didn’t mean for “and” to lose “nd”. I know it lost it because of a combination of “d” and “n” from each prefix but I don’t know why

I lose more manipulation over the last few items the more items I place inside of the prefixes, for example if I add “ars” to the prefixes the outcome becomes:

c that r up a opened a tattoag car dealership educated 

What am I doing wrong?


Solution

  • I would suggest using re.sub for removing the ed at the end. Here is a one-liner:

    import re
    x = 'cars that ran up and opened a tattoo aged car dealership educated'
    y = ' '.join([w if w == "opened" else re.sub(r'ed$', '', w)[:8] for w in x.split(' ')])
    

    If you want to remove multiple suffixes, extend your regexp accordingly:

    y = ' '.join([w if w == "opened" else re.sub(r'(ed|an)$', '', w)[:8] for w in x.split(' ')])
    

    Of course you can also build the regexp based on a list of suffixes:

    suffixes = ['ed','an']
    pattern = re.compile('('+'|'.join(suffixes)+')$')
    y = ' '.join([w if w == "opened" else pattern.sub('', w)[:8] for w in x.split(' ')])