I have a Dataframe with hundreds of columns and very long column-names. I want to remove any text in the column-names after the ":" or "." This is basically splitting on multiple delimiters.
I tried to extract column-names in a list and use the split method at ":" and "." and then keep only the portion of the text before ":" or "." but the split did not work as I wanted. I do not know why. any idea how to fix it and achieve my goal.
data = {'Name of the injured. Bla bla bla': ['Bill', 'John'],
'Age of the injured: bla bla': [50,40],
}
df_data = pd.DataFrame.from_dict(data)
print(df_data)
cols = df_data.columns.values
new_cols = [( x.split(':') or x.split('.') ) for x in cols]
print(new_cols)
This is the outcome that I need:
Thanks, GR
You could use regular expression re
in order to split the column name strings, and then reassing them to the dataframe :
import pandas as pd
import re
data = {
'Name of the injured. Bla bla bla': ['Bill', 'John'],
'Age of the injured: bla bla': [50, 40]
}
df_data = pd.DataFrame.from_dict(data)
then, following your intuition, you could do:
cols = df_data.columns.values
new_cols = [re.split(r'[.:]', item)[0] for item in cols]
df_data.columns = new_cols
If you have some other charcaters to split your text, just add it between the brackets, and here we choose to keep only the first part, hence the [0]
.
The result for df_data
is as expected:
Name of the injured Age of the injured
0 Bill 50
1 John 40