I have a data in an excel file(only 1 column) where there are several japanese characters followed by fullwidth numbers. I want to convert these numbers into normal numbers.
いつもありがとう890ございます
忙しい7ー10ー1ところ
These are several rows like these.
What can I do so these rows could look like this:
いつもありがとう890ございます
忙しい7ー10ー1ところ
I tried doing this but I am not sure if this is how it should be done like
s = unicodedata.normalize('NFKC', df.to_string())
Assuming such an example, in which col1
is the column to process:
df = pd.DataFrame({'col1': ['いつもありがとう890ございます 忙しい7ー10ー1ところ',
'いつもありがとう890ございます 忙しい7ー10ー1ところ'],
'col2': [1, 2]
})
You can use apply
:
import unicodedata
from functools import partial
df['col1'] = df['col1'].apply(partial(unicodedata.normalize, 'NFKC'))
Variant:
df['col1'] = df['col1'].apply(lambda s: unicodedata.normalize('NFKC', s))
Output:
col1 col2
0 いつもありがとう890ございます 忙しい7ー10ー1ところ 1
1 いつもありがとう890ございます 忙しい7ー10ー1ところ 2