I have an app that's been working for months and is now giving me an error.
The app takes tweets from the Twitter API and runs them through Google's Sentiment Analysis API, returning sentiment analysis on each of the tweets.
Without changing the code, I'm suddenly getting a error that hasn't happened before.
Error message
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "The language sq is not supported for document_sentiment analysis."
debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.69.95:443 {grpc_message:"The language sq is not supported for document_sentiment analysis.", grpc_status:3, created_time:"2022-12-24T04:26:31.031735656+00:00"}"
>
Interpretation
Even though I'm stating only 'english' language tweets in my Twitter API query (-is:retweet lang:en
), my understanding of the error messsage is that the NL API is thinking this is some language referred to as sq
. My research says that's 'Albanian'.
So my assumption is that the NL API is interpreting some block(s) of text in the tweets as being in Albanian, or maybe it's just a portion of an otherwise english tweet that has some Albanian language in it.
Solution
Is there a way to ignore or skip a text if the API can't process the language the text is in?
This is the language_v1
call:
def get_single_sentiment(text):
'''gets non-entity sentiment of text using GCP's api'''
# Instantiates a client
client = language_v1.LanguageServiceClient()
# The text to analyze
document = language_v1.Document(content = text , type_=language_v1.types.Document.Type.PLAIN_TEXT)
# Detects the sentiment of the text
sentiment = client.analyze_sentiment(request={"document": document}).document_sentiment
return sentiment
Below is the full error message being returned when trying to run the sentiment analysis:
---------------------------------------------------------------------------
_InactiveRpcError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
56 try:
---> 57 return callable_(*args, **kwargs)
58 except grpc.RpcError as exc:
/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
945 wait_for_ready, compression)
--> 946 return _end_unary_response_blocking(state, call, False, None)
947
/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
848 else:
--> 849 raise _InactiveRpcError(state)
850
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "The language sq is not supported for document_sentiment analysis."
debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.69.95:443 {grpc_message:"The language sq is not supported for document_sentiment analysis.", grpc_status:3, created_time:"2022-12-24T04:26:31.031735656+00:00"}"
>
The above exception was the direct cause of the following exception:
InvalidArgument Traceback (most recent call last)
/tmp/ipykernel_1/1103340548.py in <module>
1 twitter_stage(QUERY_TW, N_HOURS_AGO
----> 2 , TWITTER_BQ_TABLE, ENTITY)
/tmp/ipykernel_1/2800777156.py in twitter_stage(QUERY, N_HOURS_AGO, TWITTER_BQ_TABLE, ENTITY)
39
40 # get sentiment analysis
---> 41 twitapi_df = get_column_sentiment(twitapi_df, text_col='text', entity=ENTITY, query=QUERY)
42
43 # Dropping columns that can't be saved to big query because they are not compatible
/tmp/ipykernel_1/2183820933.py in get_column_sentiment(df, text_col, entity, query)
110
111 # for each entry in text_col, get a single sentiment result
--> 112 sentiment_column = df[text_col].apply(f)
113
114 # for each entry in sentiment_column, fix null values (replace nulls will two values)
/opt/conda/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwargs)
4355 dtype: float64
4356 """
-> 4357 return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
4358
4359 def _reduce(
/opt/conda/lib/python3.7/site-packages/pandas/core/apply.py in apply(self)
1041 return self.apply_str()
1042
-> 1043 return self.apply_standard()
1044
1045 def agg(self):
/opt/conda/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self)
1099 values,
1100 f, # type: ignore[arg-type]
-> 1101 convert=self.convert_dtype,
1102 )
1103
/opt/conda/lib/python3.7/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
/tmp/ipykernel_1/2183820933.py in get_single_sentiment(text)
16
17 # Detects the sentiment of the text
---> 18 sentiment = client.analyze_sentiment(request={"document": document}).document_sentiment
19
20 return sentiment
/opt/conda/lib/python3.7/site-packages/google/cloud/language_v1/services/language_service/client.py in analyze_sentiment(self, request, document, encoding_type, retry, timeout, metadata)
509 retry=retry,
510 timeout=timeout,
--> 511 metadata=metadata,
512 )
513
/opt/conda/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py in __call__(self, timeout, retry, *args, **kwargs)
152 kwargs["metadata"] = metadata
153
--> 154 return wrapped_func(*args, **kwargs)
155
156
/opt/conda/lib/python3.7/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
286 sleep_generator,
287 self._deadline,
--> 288 on_error=on_error,
289 )
290
/opt/conda/lib/python3.7/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
188 for sleep in sleep_generator:
189 try:
--> 190 return target()
191
192 # pylint: disable=broad-except
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
57 return callable_(*args, **kwargs)
58 except grpc.RpcError as exc:
---> 59 raise exceptions.from_grpc_error(exc) from exc
60
61 return error_remapped_callable
InvalidArgument: 400 The language sq is not supported for document_sentiment analysis.
Proposed Solution
I'm thinking the best possible solution must be to ignore any non-english language, and I'm wondering if that's a reasonable approach, and if someone has input on how to approach that.
Greatly appreciate any input on resolving this. thx
Tweet Content Causing Problem
Update| #shqip #shqiperi #kosova #albania #kosovo #shqiptar #shqiptare #lajme #shqiperia #tirana #prishtina #visitalbania #albanian #tirane #albaniangirl #shqipe…
The issue can be resolved by explicitly specifying document language in the code.
ie. specify language en
, define the “type_” then declare it on “document” .
For example :
type_ = language_v1.Document.Type.PLAIN_TEXT
language = "en"
document = {"type_": type_, "content": content, "language": language}
Sample code:
def sample_analyze_sentiment(content):
client = language_v1.LanguageServiceClient()
if isinstance(content, six.binary_type):
content = content.decode("utf-8")
type_ = language_v1.Document.Type.PLAIN_TEXT
language = "en"
document = {"type_": type_, "content": content, "language": language}
response = client.analyze_sentiment(request={"document": document})
sentiment = response.document_sentiment
print("Score: {}".format(sentiment.score))
print("Magnitude: {}".format(sentiment.magnitude))