I noticed that it's possible to use df.rename(columns=str.lower)
, but not df.rename(columns=str.replace(" ", "_"))
.
Is this because it is allowed to use the variable which stores the method (str.lower
), but it's not allowed to actually call the method (str.lower()
)?
There is a similar question, why the error message of df.rename(columns=str.replace(" ", "_")) is rather confusing – without an answer on that.
Is it possible to use methods of the .str
accessor (of pd.DataFrame().columns
) inside of df.rename(columns=...)
?
The only solution I came up so far is
df = df.rename(columns=dict(zip(df.columns, df.columns.str.replace(" ", "_"))))
but maybe there is something more consistent and similar to style of df.rename(columns=str.lower)
? I know df.rename(columns=lambda x: x.replace(" ", "_")
works, but it doesn't use the .str
accessor of pandas columns, it uses the str.replace()
of the standard library.
The purpose of the question is explore the possibilities to use pandas str methods when renaming columns in method chaining, that's why df.columns = df.columns.str.replace(' ', '_')
is not suitable to me.
As an df
example, assume:
df = pd.DataFrame([[0,1,2]], columns=["a pie", "an egg", "a nut"])
df.rename
accepts a function object (or other callable).
In the first case, str.lower
is a function. However, str.replace(" ", "_")
calls the function and evaluate to the result, although, in this case, the call is not correct so it raises an error. But you don't want to pass the result of calling the function, you want to pass the function.
So something like
def space_to_dash(col):
return col.replace(" ", "_")
df.rename(columns=space_to_dash)
Or, use a lambda expression:
df.rename(columns=lambda col: col.replace(" ", "_"))
Note, df.rename(columns=str.lower)
doesn't use the .str
accessor either, it uses the built-in str
method. So I think you are confused.
Now, you can use the .str
accessor on the column index object, so:
df.columns.str.replace(" ", "_")
But then you would need to do what you already said you didn't want to do:
df.columns = df.columns.str.replace(" ", "_")
It is important to point out, this mutates the original dataframe object in place as opposed to df.rename
, which returns a new dataframe object. It isn't clear why you want to use the .str
accessor, is that the reason?