pythonpandasrename

Dropping different values from multiple columns


I have a Pandas dataframe with 28 columns in total. Each one has a unique number after a name. I want to drop all the numbers from the columns but keep the name. How can I do that best?

Here is an example of the columns:

Miscellaneous group | 00002928  Alcoholic Beverages | 0000292   Animal fats group | 000029

I tried .rename() already but to do this for 28 columns isn't efficient and is time consuming. It also creates a very long coding cell in Google Colab Notebook.


Solution

  • Assuming you're starting off with, e.g.

    df.columns = ["Miscellaneous group | 00002928",  "Alcoholic Beverages | 0000292",   "Animal fats group | 000029"]
    

    The simplest solution looks like it would be to use a list comprehension to iterate over the column names and split on the | in your string and keep the first part of the resulting list, so:

    df.columns = [col.split(" | ")[0] for col in columns]
    

    This returns:

    ['Miscellaneous group', 'Alcoholic Beverages', 'Animal fats group']
    

    Alternatively, you could do this with a regex:

    import re
    
    df.columns = [re.sub(r'\s*\|.*', '', col) for col in columns]
    

    This looks for a string that begins with whitespace, followed by |, followed by anything and replaces it all with an empty string.

    Final alternative:

    columns = [re.sub(r'\s*\d+$', '', s) for s in columns]
    

    This looks for whitespace followed by digits at the end of each string, so this would remove the trailing digits regardless of what preceded them (in case the | isn't always present), so it would produce:

    ['Miscellaneous group |', 'Alcoholic Beverages |', 'Animal fats group |']