pythonpandasdataframeany

Initializing Pandas DF Columns if any Substrings in Another Column


My dataframe has a summary column with plain text. I also have a dictionary matching new column names as keys to lists of keywords as values. I'd like to add all those columns to my dataframe with each row initialized as 1 if any of their associated keywords is contained in my summary or -99 if no keywords are present.

Here's my code trying to accomplish this:

# headers is a list of strings, keywords is a list of lists.  Each column has a list of keywords
KEYWORDS_DICT = dict(zip(headers, keywords))

for column in KEYWORDS_DICT:
    df[column] = np.where(any(df['summary'].str.contains(keyword) for keyword in KEYWORDS_DICT[column]), 1, -99)
        

It's currently giving me 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().' Is there a good way to resolve this or another way to accomplish my goal?

Thanks!


Solution

  • The proposed answer gave me all 1s for all columns. I was able to get my desired result by calling '|'.join() on my keyword lists then searching my summary for that string.