pythonpandasreplace

Pandas replace multiple substring patterns via dictionary


Suppose we want to replace multiple substrings via pd.Series.replace or pd.DataFrame.replace by passing a dictionary to the to_replace argument

Example:

Replace

in the string 'Nana likes bananas and ananas'.


Solution

  • Let's try a short example:

    s = pd.Series(['abcde', 'bcde', 'xyz'])
    
    s.replace(to_replace={'ab': 'xy', 'bc': 'BC', 'cd': 'CD', 'xy': 'XY'}, regex=True)
    
    0    xyCDe
    1     BCde
    2      XYz
    dtype: object
    
    # let's swap the first two keys
    s.replace(to_replace={'bc': 'BC', 'ab': 'xy', 'cd': 'CD', 'xy': 'XY'}, regex=True)
    
    0    aBCde
    1     BCde
    2      XYz
    dtype: object
    
    # overlapping regex, with lookarounds
    pd.Series(['abcde']).replace(to_replace={'a(?=b)': 'A', '(?<=b)c': 'C'}, regex=True)
    0    AbCde
    dtype: object
    
    # overlapping regex in which the first pattern breaks the second one
    pd.Series(['abcde']).replace(to_replace={'ab': 'A', '(?<=b)c': 'C'}, regex=True)
    0    Acde
    dtype: object
    
    # overlapping pattern in which the replacement preserves the second pattern
    pd.Series(['abcde']).replace(to_replace={'ab': 'Ab', '(?<=b)c': 'C'}, regex=True)
    0    AbCde
    dtype: object
    
    # overlapping pattern in which the replacement creates the second pattern
    pd.Series(['abcde']).replace(to_replace={'ab': 'Ax', '(?<=x)c': 'C'}, regex=True)
    0    Axcde
    dtype: object