I'm working on a Sentiment Analysis project using the Google Cloud Natural Language API and Python, this question might be similar to this other question, what I'm doing is the following:
I'll put my code below, but prior to that, I just want to mention that I have tested it with a sample CSV with less than 100 records and it works well, I am also aware about the quota limit of 600 requests per minute, reason why I put a delay on each iteration, still, I'm getting the error I specify at the title. I'm also aware about the suggestion of increasing the ulimit, but I don't think that's a good solution.
Here's my code:
from google.cloud import language_v1
from google.cloud.language_v1 import enums
from google.cloud import storage
from time import sleep
import pandas
import sys
pandas.options.mode.chained_assignment = None
def parse_csv_from_gcs(csv_file):
df = pandas.read_csv(f, encoding = "ISO-8859-1")
return df
def analyze_sentiment(text_content):
client = language_v1.LanguageServiceClient()
type_ = enums.Document.Type.PLAIN_TEXT
language = 'es'
document = {"content": text_content, "type": type_, "language": language}
encoding_type = enums.EncodingType.UTF8
response = client.analyze_sentiment(document, encoding_type=encoding_type)
return response
gcs_path = sys.argv[1]
output_bucket = sys.argv[2]
output_csv_file = sys.argv[3]
dataframe = parse_csv_from_gcs(gcs_path)
for i in dataframe.index:
print(i)
response = analyze_sentiment(dataframe.at[i, 'FieldOfInterest'])
dataframe.at[i, 'Score'] = response.document_sentiment.score
dataframe.at[i, 'Magnitude'] = response.document_sentiment.magnitude
sleep(0.5)
print(dataframe)
dataframe.to_csv("results.csv", encoding = 'ISO-8859-1')
gcs = storage.Client()
gcs.get_bucket(output_bucket).blob(output_csv_file).upload_from_filename('results.csv', content_type='text/csv')
The 'analyze_sentiment' function is very similar to what we have in Google's documentation, I just modified it a little, but it does pretty much the same thing.
Now, the program is raising that error and crashes when it reaches a record between 550 and 700, but I don't see the correlation between the service account JSON and calling the Natural Language API, so I also think that when I call the the API, it opens the account credential JSON file but doesn't close it afterwards.
I'm currently stuck with this issue and ran out of ideas, so any help will be much appreciated, thanks in advance =)!
[UPDATE]
I've solved this issue by extracting the 'client' out of the 'analyze_sentiment' method and passing it as a parameter, as follows:
def analyze_sentiment(ext_content, client):
<Code>
Looks like every time it reaches this line:
client = language_v1.languageServiceClient()
It opens the account credential JSON file and it doesn't get closed, so extracting it to a global variable made this work =).
I've updated the original post with the solution for this, but in any case, thanks to everyone that saw this and tried to reply =)!