pandasdataframetilde

Change characters like á é í ó ú ñ to their respective without accent in DataFrame


I am working with some open data through Deep Note with the pandas library and since it is in Spanish there are accents and characters like 'ñ' in the DataFrame

Searching I have been able to solve part of the problem by putting 'encoding'. The problem is when I publish the page that they appear as strange signs because of the accents like 'á é í ó ú ñ' and then I would like to know if there is any way to read the columns that contain words and change it to their respective without accent.

datos = pd.read_csv("/work/avisos",delimiter = ';', encoding="ISO-8859-1")

Solution

  • import unicodedata
    
    def remove_accents(x):
        return (unicodedata.normalize('NFD', x)
                           .encode('ascii', 'ignore')
                           .decode('utf-8'))
    
    
    word_cols = df.dtypes[lambda x: x.eq('object')].index.tolist()
    df[word_cols] = df[word_cols].applymap(remove_accents)
    

    Adapted from: How to replace accented characters?


    This being said, you may only need to do:

        return unicodedata.normalize('NFD', x)
    

    For the accents to appear as expected on the published page ~