Am struggling to capture the results from the IBM Watson entity analysis in a dictionary. I would like to extract the sentiment of each link through a function. I have a function created to extract a single url. But the dictionary am trying to store the results captures only the last url results. I am new to Python, and appreciate any help.
Here is my entity analysis code,
# function to process an URL
def processurl(url_to_analyze):
# end point
endpoint = f"{URL}/v1/analyze"
# credentials
username = "apikey"
password = API_KEY
# parameters
parameters = {
"version": "2020-08-01"
}
# headers
headers = {
"Content-Type":"application/json"
}
# watson options
watson_options = {
"url": url_to_analyze,
"features": {
"entities": {
"sentiment": True,
"emotion": True,
"limit":10
}
}
}
# return
response = requests.post(endpoint,
data=json.dumps(watson_options),
headers=headers,
params=parameters,
auth=(username,password)
)
return response.json()
here is the function I created to pass the result from above
# create a function to extract the entities from the result data
def getentitylist(data,threshold):
result = []
for entity in data["entities"]:
relevance = float(entity["relevance"])
if relevance > threshold:
result.append(entity["text"])
return result
After looping through the URL's, I can't seemed to store the result in a dictionary so that I can pass that to my function for entity results
# method II: loop through news api urls and perform entity analysis and store it in a dictionary
entitydict = {}
for url in url_to_analyze:
entitydict.update(processurl(url))
I can't see where you are calling getentitylist
, but in your url loop
entitydict = {}
for url in url_to_analyze:
entitydict.update(processurl(url))
update
will be updating the dictionary based on key values. ie. this will overwrite the values for any keys already in the dictionary. As your response will look something like:
{
"usage": {
"text_units": 1,
"text_characters": 2708,
"features": 1
},
"retrieved_url": "http://www.cnn.com/",
"language": "en",
"entities": [
{
"type": "Company",
"text": "CNN",
"sentiment": {
"score": 0.0,
"label": "neutral"
},
"relevance": 0.784947,
"disambiguation": {
"subtype": [
"Broadcast",
"AwardWinner",
"RadioNetwork",
"TVNetwork"
],
"name": "CNN",
"dbpedia_resource": "http://dbpedia.org/resource/CNN"
},
"count": 9
}
]
}
The keys that will be updated are at the top level ie. usage
, retrieved_url
, retrieved_url
, entities
. So entitydict
will only contain the response for the last url, as previous values for these keys will get overwritten.
What you should be doing is use the url as key to each response.
entitydict = {}
for url in url_to_analyze:
entitydict.update({url : processurl(url)})