pythonpandas

Cleaning column names in pandas


I have a dataframe I receive from a crawler that I am importing into a database for long-term storage. The problem I am running into is a large amount of the various dataframes have uppercase and whitespace.

I have a fix for it, but I was wondering if it can be done any cleaner better than this:

def clean_columns(dataframe):
    for column in dataframe:
        dataframe.rename(columns = {column : column.lower().replace(" ", "_")},
                        inplace = 1)
    return dataframe

Usage:

[In] print(dataframe.columns)
[Out] Index(['Daily Foo', 'Weekly Bar'])

[In] dataframe = clean_columns(dataframe)
[In] print(dataframe.columns)
[Out] Index(['daily_foo', 'weekly_bar'])

Solution

  • You can try via columns attribute:

    df.columns=df.columns.str.lower().str.replace(' ','_')
    

    OR

    via rename() method:

    df=df.rename(columns=lambda x:x.lower().replace(' ','_'))