pythonpandasdataframecountwords

How do I count the total number of words in a Pandas dataframe cell and add those to a new column?


A common task in sentiment analysis is to obtain the count of words within a Pandas data frame cell and create a new column based on that count. How do I do this?


Solution

  • Let's say you have a dataframe df that you've generated using

    df = pandas.read_csv('dataset.csv')
    

    You would then add a new column with the word count by doing the following:

    df['new_column'] = df.columnToCount.apply(lambda x: len(str(x).split(' ')))
    

    Keep in mind the space in the split is important since you're splitting on new words. You may want to remove punctuation or numbers and reduce to lowercase before performing this as well.

    df = df.apply(lambda x: x.astype(str).str.lower())
    df = df.replace('\d+', '', regex = True)
    df = df.replace('[^\w\s\+]', '', regex = True)