I'm currently working on a project where I need to perform conditional replacements in a Pandas DataFrame. I've implemented a solution, but I'm wondering if there's a more efficient way to achieve the same result.
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df_init = pd.DataFrame(data)
# Using `.loc`
df_init['Group'] = ['A', 'B', 'C', 'D', 'E']
df_init.loc[df_init.loc[
(df_init.City=='New York')
&(df_init.Name=='Alice')].index, 'City'] = 'Hamburg'
# Using method chaining
def _replace(dataframe):
dataframe.loc[dataframe.loc[
(dataframe.City=='New York')
&(dataframe.Name=='Alice')].index, 'City'] = 'Hamburg'
return dataframe
(df_init
.assign(Group=['A', 'B', 'C', 'D', 'E'])
.pipe(_replace)
)
In the first method, I'm using the .loc
method to locate the row where the conditions are met and then perform the replacement. In the second approach, I'm using method chaining with .assign
and .pipe
to achieve the same result.
My question is: Is there a more efficient way to replace the .loc
method with method chaining in Pandas DataFrame operations? I should mention that I am very new to the methods chaining world of pandas.
Code
make condition and use mask
func.
cond = df_init['City'].eq('New York') & df_init['Name'].eq('Alice')
out = df_init.assign(
Group = ['A', 'B', 'C', 'D', 'E'],
City=df_init['City'].mask(cond, 'Hamburg')
)
out
Name Age City Group
0 Alice 25 Hamburg A
1 Bob 30 Los Angeles B
2 Charlie 35 Chicago C
3 David 40 Houston D
4 Emma 45 Phoenix E