pythonpandassentiment-analysisgoogle-natural-language

Splitting Google sentiment analysis response into separate columns and generating `None` for cells with no value


Goal

I want to split the response from Google Sentiment Analysis into four columns, then merge with original content dataframe.

Situation

I'm running the Google sentiment analysis on a column of text in a python dataframe.
Here's a sample for one of the returned rows. The column is 'sentiment':

magnitude: 0.6000000238418579\nscore: -0.6000000238418579

I then need to split that cell into four new columns, one for magnitude, one for it's returned value, one for score, and one for it's returned value.

What I've tried

Currently, I'm using this method to do that:

df02 = df01['sentiment'].astype(str).str.split(expand=True)

I'm then merging those four columns with the original dataframe that contains the analyzed text field and other values.

However, if sentiment returns no results, the sentiment cell is empty. And if all rows have empty sentiment cells, then it won't create four new columns. And that breaks my attempt to merge the two dataframes.

So I'm trying to understand how I can insert None into the new four column cells if the sentiment cell value is empty in the source dataframe. That way, at least I'll have four columns, with the values for each of the four new cells being None.

I've received input that I should use apply() and fillna, but I'm not understanding how that should be handled in my instance, and the documentation isn't clear to me. It seems like the method above needs code added that inserts None if no value is detected, but I'm not familiar enough with Python or pandas to know where to start on that.

EXAMPLE

What the data returned looks like. If all rows have no entry, then it won't create the four columns, which is required for my next method of merging this dataframe back into the dataframe with the original text content.

|index|0|1|2|3|
|---|---|---|---|---|
|0|||||
|1|||||
|2|||||
|3|||||
|4|||||
|5|magnitude:|0\.6000000238418579|score:|-0\.6000000238418579|
|6|magnitude:|0\.10000000149011612|score:|0\.10000000149011612|
|7|magnitude:|0\.10000000149011612|score:|-0\.10000000149011612|
|8|magnitude:|0\.699999988079071|score:|-0\.699999988079071|
|9|magnitude:|0\.699999988079071|score:|-0\.30000001192092896|
|10|magnitude:|0\.699999988079071|score:|-0\.30000001192092896|

Solution

  • As mentioned by @dsx, the responses from Google Sentiment Analysis can be split into four columns by using the below code :

    pd.DataFrame(df['sentiment'].apply(sentiment_pass).tolist(),columns=['magnitude', 'score'], index=df.index)
    

    Sentiment Analysis is used to identify the prevailing emotions within the text using natural language processing. For more information, you can check this link.