[SOLVED] Replace column values if it repeats the same character

Replace column values if it repeats the same character

I have a dataframe like

df = pd.DataFrame({'team': ['Red', 'Blue', 'Yellow', 'Green'],
               'points': [11, 7, 8, 10],
               'other': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"]})

I would like to replace all cells that contain only ";" with "" (empty cells). There may be only 1 or many ";" in a row. If there is anything in the cells other than ";", I would like to leave them as is.

In my dataframe the other column would become:

 other
kk;aaa;;;
;;bb;;
             #(nothing)
             #(nothing)

Solution

You can use Series.replace with a regex pattern:

df['other'] = df.other.replace(r'^;{1,}$','', regex=True)

print(df)

     team  points      other
0     Red      11  kk;aaa;;;
1    Blue       7     ;;bb;;
2  Yellow       8           
3   Green      10

Explanation pattern ^;{1,}$:

^ asserts position at start of the string
; matches the character ;
{1,} matches the previous token (i.e. ;) between one and unlimited times
$ asserts position at the end of the string

You can also use this on multiple columns, with df.replace. E.g.:

df = pd.DataFrame({'team': ['Red', 'Blue', 'Yellow', 'Green'],
               'points': [11, 7, 8, 10],
               'other': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"],
               'other2': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"]})

cols = ['other','other2']

df[cols] = df[cols].replace(r'^;{1,}$','', regex=True)

print(df)

     team  points      other     other2
0     Red      11  kk;aaa;;;  kk;aaa;;;
1    Blue       7     ;;bb;;     ;;bb;;
2  Yellow       8                      
3   Green      10