My dataframe has a summary column with plain text. I also have a dictionary matching new column names as keys to lists of keywords as values. I'd like to add all those columns to my dataframe with each row initialized as 1 if any of their associated keywords is contained in my summary or -99 if no keywords are present.
Here's my code trying to accomplish this:
# headers is a list of strings, keywords is a list of lists. Each column has a list of keywords
KEYWORDS_DICT = dict(zip(headers, keywords))
for column in KEYWORDS_DICT:
df[column] = np.where(any(df['summary'].str.contains(keyword) for keyword in KEYWORDS_DICT[column]), 1, -99)
It's currently giving me 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().' Is there a good way to resolve this or another way to accomplish my goal?
Thanks!
The proposed answer gave me all 1s for all columns. I was able to get my desired result by calling '|'.join() on my keyword lists then searching my summary for that string.