I'm working on a script to convert a data file from one format to another. I need to remove the special characters from the column headers.
I am using Pandas to read a CSV file with the below structure. I'm looking for a tidy way to remove the [units] form the column name.
Data File:
Date ,Time ,app1_sum [Ml] ,app1_q [l/s] ,app1_h [m] ,app1_a [m²] ,app1_v [m/s] ,app1_t_water [°C]
02.10.2024 ,19:05:00 ,57293.336 ,620.78 ,0.436 ,0.586 ,1.059 ,18.2
My goal is to have column names stripped down to their simplest form:
Date,Time,app1_sum,app1_q,app1_h,app1_a,app1_v,app1_t_water
My current approach is to remove the brackers first, then remove the remining units one at a time.
df.columns = df.columns.str.replace('[', '')
df.columns = df.columns.str.replace(']', '')
df.columns = df.columns.str.replace(' Ml', '')
I also tried to use regex to remove one unit at a time, this works but doesn't feel right.
df.rename(columns=lambda x: re.sub(r'\[Ml\]', '', x), inplace=True)
df.rename(columns=lambda x: re.sub(r'\[l/s\]', '', x), inplace=True)
df.rename(columns=lambda x: re.sub(r'\[m\]', '', x), inplace=True)
df.rename(columns=lambda x: re.sub(r'\[m²\]', '', x), inplace=True)
df.rename(columns=lambda x: re.sub(r'\[m/s\]', '', x), inplace=True)
df.rename(columns=lambda x: re.sub(r'\[°C\]', '', x), inplace=True)
You can use the following code to replace all brackets and their content in the columns names:
df.rename(columns=lambda x: re.sub(r'\[[^\]]+\]', '', x), inplace=True)
\[
will match the opening bracket [
[^\]]+
will match an abtritrary number of every character except ]
\]
will match the closing bracket
The \
are needed to escape the brackets since they are part of regex syntax