pythonpandas

Replacing cell values with a specific one


Good afternoon! There is a column, there are many unique values in it, there are implicit duplicates, you need to replace all values where "system analyst..." occurs with "system analyst". For example, "1C system analyst", system analyst (junior), system analyst (pbi) should be replaced with "system analyst".

I wrote the code, but the implicit duplicates are still present

data['name'] = data['name'].astype(str) 
data['name'] = data['name'].apply(lambda x: re.sub(r'(system analyst).*', r'\1', x))

Solution

  • You can use .loc to access, mask and assign rows.

    data.loc[data["name"].str.contains("system analyst", case=False), "name"] = "system analyst"
    

    ought to do the trick without having to use apply.