pythonpandasstringdataframereplace

Replace column values if it repeats the same character


I have a dataframe like

df = pd.DataFrame({'team': ['Red', 'Blue', 'Yellow', 'Green'],
               'points': [11, 7, 8, 10],
               'other': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"]})

I would like to replace all cells that contain only ";" with "" (empty cells). There may be only 1 or many ";" in a row. If there is anything in the cells other than ";", I would like to leave them as is.

In my dataframe the other column would become:

 other
kk;aaa;;;
;;bb;;
             #(nothing)
             #(nothing)

Solution

  • You can use Series.replace with a regex pattern:

    df['other'] = df.other.replace(r'^;{1,}$','', regex=True)
    
    print(df)
    
         team  points      other
    0     Red      11  kk;aaa;;;
    1    Blue       7     ;;bb;;
    2  Yellow       8           
    3   Green      10           
    

    Explanation pattern ^;{1,}$:

    You can also use this on multiple columns, with df.replace. E.g.:

    df = pd.DataFrame({'team': ['Red', 'Blue', 'Yellow', 'Green'],
                   'points': [11, 7, 8, 10],
                   'other': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"],
                   'other2': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"]})
    
    cols = ['other','other2']
    
    df[cols] = df[cols].replace(r'^;{1,}$','', regex=True)
    
    print(df)
    
         team  points      other     other2
    0     Red      11  kk;aaa;;;  kk;aaa;;;
    1    Blue       7     ;;bb;;     ;;bb;;
    2  Yellow       8                      
    3   Green      10