nlpgoogle-colaboratorygoogle-cloud-nlgoogle-natural-language

Access Google Cloud Natural Language with Google Colab


I am attempting to use Google's Cloud Natural Language API with Google Colab.

I started by following Google's simple example: https://cloud.google.com/natural-language/docs/samples/language-entity-sentiment-text#language_entity_sentiment_text-python

So, my Colab notebook was literally just one code cell:

from google.cloud import language_v1

client = language_v1.LanguageServiceClient()

text_content = 'Grapes are good. Bananas are bad.'

# Available types: PLAIN_TEXT, HTML
type_ = language_v1.types.Document.Type.PLAIN_TEXT

# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
document = {"content": text_content, "type_": type_, "language": language}

# Available values: NONE, UTF8, UTF16, UTF32
encoding_type = language_v1.EncodingType.UTF8

response = client.analyze_entity_sentiment(request = {'document': document, 'encoding_type': encoding_type})

That resulted in several error messages, which I seemed to resolve, mostly with the help of this SO post, by slightly updating the code as follows:

from google.cloud import language_v1

client = language_v1.LanguageServiceClient()

text_content = 'Grapes are good. Bananas are bad.'

# Available types: PLAIN_TEXT, HTML
type_ = language_v1.types.Document.Type.PLAIN_TEXT

# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
#document = {"content": text_content, "type_": type_, "language": language} ## "type_" is not valid???
document = {"content": text_content, "type": type_, "language": language}

# Available values: NONE, UTF8, UTF16, UTF32
#encoding_type = language_v1.EncodingType.UTF8 ## Does not seem to work
encoding_type = "UTF8"

#response = client.analyze_entity_sentiment(request = {'document': document, 'encoding_type': encoding_type}) ## remove request
response = client.analyze_entity_sentiment( document = document, encoding_type = encoding_type )

Which, after 10 excruciating minutes, results in the following error:

_InactiveRpcError                         Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     72         try:
---> 73             return callable_(*args, **kwargs)
     74         except grpc.RpcError as exc:

11 frames
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "Getting metadata from plugin failed with error: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7f68cee39a90>)"
    debug_error_string = "{"created":"@1648840699.964791285","description":"Getting metadata from plugin failed with error: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7f68cee39a90>)","file":"src/core/lib/security/credentials/plugin/plugin_credentials.cc","file_line":91,"grpc_status":14}"
>

The above exception was the direct cause of the following exception:

ServiceUnavailable                        Traceback (most recent call last)
ServiceUnavailable: 503 Getting metadata from plugin failed with error: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7f68cee39a90>)

The above exception was the direct cause of the following exception:

RetryError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/six.py in raise_from(value, from_value)

RetryError: Deadline of 600.0s exceeded while calling functools.partial(<function _wrap_unary_errors.<locals>.error_remapped_callable at 0x7f68cedb69e0>, document {
  type: PLAIN_TEXT
  content: "Grapes are good. Bananas are bad."
  language: "en"
}
encoding_type: UTF8
, metadata=[('x-goog-api-client', 'gl-python/3.7.13 grpc/1.44.0 gax/1.26.3 gapic/1.2.0')]), last exception: 503 Getting metadata from plugin failed with error: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7f68cee39a90>)

Can you please help me with this simple "Hello world!" for Cloud Natural Language with Google Colab?

My hunch is that I need to create a service account and somehow provide that key file to Colab, like this SO answer. If so, can you hold my hand a little more and tell me how I would implement that in Colab (vs. running locally)? I am new to Colab.


Solution

  • This appears to have worked:

    Start by creating a service account, generating a key file, and saving the JSON file locally. https://console.cloud.google.com/iam-admin/serviceaccounts (I would still love to know which, if any roles I should select in the service account generation: "Grant this service account access to project")

    Cell 1: Upload a json file with my service account keys

    from google.colab import files
    uploaded = files.upload()
    

    Cell 2:

    from google.oauth2 import service_account
    from google.cloud import language_v1
    
    client = language_v1.LanguageServiceClient.from_service_account_json("my-super-important-gcp-key-file.json")
    

    Cell 3:

    text_content = 'Grapes are good. Bananas are bad.'
    type_ = language_v1.types.Document.Type.PLAIN_TEXT
    language = "en"
    document = {"content": text_content, "type": type_, "language": language}
    encoding_type = "UTF8"
    response = client.analyze_entity_sentiment( document = document, encoding_type = encoding_type )
    response
    

    Here is the output:

    entities {
      name: "Grapes"
      type: OTHER
      salience: 0.8335162997245789
      mentions {
        text {
          content: "Grapes"
        }
        type: COMMON
        sentiment {
          magnitude: 0.800000011920929
          score: 0.800000011920929
        }
      }
      sentiment {
        magnitude: 0.800000011920929
        score: 0.800000011920929
      }
    }
    entities {
      name: "Bananas"
      type: OTHER
      salience: 0.16648370027542114
      mentions {
        text {
          content: "Bananas"
          begin_offset: 17
        }
        type: COMMON
        sentiment {
          magnitude: 0.699999988079071
          score: -0.699999988079071
        }
      }
      sentiment {
        magnitude: 0.699999988079071
        score: -0.699999988079071
      }
    }
    language: "en"
    

    I am certain that I have just violated all sorts of security protocols. So, please, I welcome any advice for how I should improve this process.