I have a pandas data frame.
keyword adGroup goal6Value adCost
aaaa (not set) 0 0.0
+bbbb (not set) 0 0.0
+cccc (not set) 2072 0.0
dddd (not set) 0 0.0
I changed the values in the first column, to add brackets to the keywords based on some conditions (if there's no "+" symbol, add brackets).
keyword adGroup goal6Value adCost
[aaaa] (not set) 0 0.0
+bbbb (not set) 0 0.0
+cccc (not set) 2072 0.0
[dddd] (not set) 0 0.0
This is the function created to add bracket:
def add_bracket(df):
df["keyword"] = df["keyword"].astype('str')
keyword_list = list()
for index, row in df.iterrows():
keyword = row["keyword"]
if keyword.find("+") < 0:
keyword = "[" + keyword + "]"
keyword_list.append(keyword)
kw = pd.DataFrame(keyword_list, columns = ['Keyword2'])
df2 = pd.concat([df, kw], axis=1).drop(columns["keyword"]).rename(columns={'Keyword2': 'keyword'})
df2 = df2[['keyword', 'adGroup', 'goal6Value', 'adCost']]
return df2
The function produced the result I want, but is there a neater way in pandas so that I don't need to create df2 to add the output of column 1 (basically doing the changes inplace)?
Solution: Based on @Inder's suggested answer, this whole function can be written in one line.
df["keyword"] = df.keyword.apply(lambda x: "[" + x + "]" if x.find("+") < 0 else x)
Based on @RafaelC's answer.
mask = df.keyword.str.contains('+', regex=False)
df.loc[~mask, 'keyword'] = "[" + df.loc[~mask, 'keyword'] + "]"
(Assuming you're doing this to have a list-like view, and not to work with list objects)
Just sum
mask = df.keyword.str.contains('+', regex=False)
df.loc[~mask, 'keyword'] = "[" + df.loc[~mask, 'keyword'] + "]"
keyword
0 [aaaa]
1 [bbbb]
2 [cccc]
3 [dddd]
Why is this better than apply
?
Take a look at the timings :
%timeit "[" + df.loc[mask, 'keyword'] + "]"
348 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.keyword.apply(lambda x:[x])
112 µs ± 3.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Whoa, so apply is faster?
Not quite. Maybe in a very very small df
, but take a look at the same operation on a bigger df
with 100,000 times more rows :
df = pd.concat([df]*100000)
%timeit "[" + df.loc[mask, 'keyword'] + "]"
4.54 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.keyword.apply(lambda x:[x])
129 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So apply
gets very very slow very fast, but vectorized operations don't