pythonjsonserializationgoogle-cloud-language

how to convert google cloud natural language entity sentiment response to JSON/dict in Python?


I am trying to use google cloud natural language API for analyzing entity sentiments.

from google.cloud import language_v1
import os 
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/json'

client = language_v1.LanguageServiceClient()
text_content = 'Grapes are good. Bananas are bad.'

# Available types: PLAIN_TEXT, HTML
type_ = language_v1.Document.Type.PLAIN_TEXT

# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
document = language_v1.Document(content=text_content, type_=language_v1.Document.Type.PLAIN_TEXT)

# Available values: NONE, UTF8, UTF16, UTF32
encoding_type = language_v1.EncodingType.UTF8
response = client.analyze_entity_sentiment(request = {'document': document, 'encoding_type': encoding_type})

I then print out the entities and its attributes from the response.

for entity in response.entities:
    print('=' * 20)
    print(type(entity))
    print(entity)

====================
<class 'google.cloud.language_v1.types.language_service.Entity'>
name: "Grapes"
type_: OTHER
salience: 0.8335162997245789
mentions {
  text {
    content: "Grapes"
  }
  type_: COMMON
  sentiment {
    magnitude: 0.8999999761581421
    score: 0.8999999761581421
  }
}
sentiment {
  magnitude: 0.8999999761581421
  score: 0.8999999761581421
}

====================
<class 'google.cloud.language_v1.types.language_service.Entity'>
name: "Bananas"
type_: OTHER
salience: 0.16648370027542114
mentions {
  text {
    content: "Bananas"
    begin_offset: 17
  }
  type_: COMMON
  sentiment {
    magnitude: 0.8999999761581421
    score: -0.8999999761581421
  }
}
sentiment {
  magnitude: 0.8999999761581421
  score: -0.8999999761581421
}

Now I want to store the entire response in a JSON or dictionary format so that I can store it to a table in database or do my processing. I tried following converting Google Cloud NLP API entity sentiment output to JSON and How can I JSON serialize an object from google's natural language API? (No __dict__ attribute) but it did not work.

If i use

from google.protobuf.json_format import MessageToDict, MessageToJson 
result_dict = MessageToDict(response)
result_json = MessageToJson(response)

I get an error saying

>>> result_dict = MessageToDict(response)
Traceback (most recent call last):
  File "/Users/pmehta/Anaconda-3/anaconda3/envs/nlp_36/lib/python3.6/site-packages/proto/message.py", line 555, in __getattr__
    pb_type = self._meta.fields[key].pb_type
KeyError: 'DESCRIPTOR'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pmehta/Anaconda-3/anaconda3/envs/nlp_36/lib/python3.6/site-packages/google/protobuf/json_format.py", line 175, in MessageToDict
    return printer._MessageToJsonObject(message)
  File "/Users/pmehta/Anaconda-3/anaconda3/envs/nlp_36/lib/python3.6/site-packages/google/protobuf/json_format.py", line 209, in _MessageToJsonObject
    message_descriptor = message.DESCRIPTOR
  File "/Users/pmehta/Anaconda-3/anaconda3/envs/nlp_36/lib/python3.6/site-packages/proto/message.py", line 560, in __getattr__
    raise AttributeError(str(ex))
AttributeError: 'DESCRIPTOR'

How do I parse this response to convert it correctly to a json or dict?


Solution

  • As part of google-cloud-language 2.0.0 migration, response messages are provided by proto-plus, which wraps the raw protobuf messages. ParseDict and MessageToDict are methods provided by protobuf and since proto-plus wraps the proto messages those protobuf methods can no longer be used directly.

    Replace

    from google.protobuf.json_format import MessageToDict, MessageToJson 
    result_dict = MessageToDict(response)
    result_json = MessageToJson(response)
    

    with

    import json
    result_json = response.__class__.to_json(response)
    result_dict = json.loads(result_json)
    result_dict