I have a Pandas dataframe with 28 columns in total. Each one has a unique number after a name. I want to drop all the numbers from the columns but keep the name. How can I do that best?
Here is an example of the columns:
Miscellaneous group | 00002928 Alcoholic Beverages | 0000292 Animal fats group | 000029
I tried .rename()
already but to do this for 28 columns isn't efficient and is time consuming. It also creates a very long coding cell in Google Colab Notebook.
Assuming you're starting off with, e.g.
df.columns = ["Miscellaneous group | 00002928", "Alcoholic Beverages | 0000292", "Animal fats group | 000029"]
The simplest solution looks like it would be to use a list comprehension to iterate over the column names and split on the |
in your string and keep the first part of the resulting list, so:
df.columns = [col.split(" | ")[0] for col in columns]
This returns:
['Miscellaneous group', 'Alcoholic Beverages', 'Animal fats group']
Alternatively, you could do this with a regex:
import re
df.columns = [re.sub(r'\s*\|.*', '', col) for col in columns]
This looks for a string that begins with whitespace, followed by |
, followed by anything and replaces it all with an empty string.
Final alternative:
columns = [re.sub(r'\s*\d+$', '', s) for s in columns]
This looks for whitespace followed by digits at the end of each string, so this would remove the trailing digits regardless of what preceded them (in case the |
isn't always present), so it would produce:
['Miscellaneous group |', 'Alcoholic Beverages |', 'Animal fats group |']