python-3.xregexpandasstringdataframe

Replace specific chinese punctuations with correspondent english ones in Python


Given a test dataset as follows:

   id                   company
0   1                  xyz,ltd。
1   2  wall street english (bj)
2   3                 James(sh)
3   4                       NaN
4   5                    黑石(上海)

I need to replace chinese punctutations with correspondent english one: ( for , ) for , . for and , for .

I try with dd.company.str.replace('(', '(').replace(')', ')').replace('。', '.').replace(',', ','), it's not pythonic solution and not work out either.

Out:

0                    xyz,ltd。
1    wall street english (bj)
2                   James(sh)
3                         NaN
4                      黑石(上海)
Name: company, dtype: object

How could I replace them correctly? Thanks a lot.


Solution

  • One idea is use 2 lists or dictionary and pass regex=True for substring replacement:

    dd.company.replace(['(',')', '。', ','], ['(',')','.', ','], regex=True)
    

    dd.company.replace({'(':')', '(':')', '。':'.', ',':','}, regex=True)