pythonibm-watsonwatson-nlu

Struggling to create a dictionary to store results from IBM Watson Entity Analysis


Am struggling to capture the results from the IBM Watson entity analysis in a dictionary. I would like to extract the sentiment of each link through a function. I have a function created to extract a single url. But the dictionary am trying to store the results captures only the last url results. I am new to Python, and appreciate any help.

Here is my entity analysis code,

# function to process an URL
def processurl(url_to_analyze):
  # end point
  endpoint = f"{URL}/v1/analyze"

  # credentials
  username = "apikey"
  password = API_KEY

  # parameters
  parameters = {
      
      "version": "2020-08-01"
      
  }

  # headers
  headers = {
      "Content-Type":"application/json"
  }

  # watson options
  watson_options = {
      "url": url_to_analyze,
      "features": {
          "entities": {
              "sentiment": True,
              "emotion": True,
              "limit":10
          }
      }

  }

  # return
  response = requests.post(endpoint,
                           data=json.dumps(watson_options),
                           headers=headers,
                           params=parameters,
                           auth=(username,password)
                           )
  return response.json()

here is the function I created to pass the result from above

# create a function to extract the entities from the result data
def getentitylist(data,threshold):
  result = []
  for entity in data["entities"]:
    relevance = float(entity["relevance"])
    if relevance > threshold:
      result.append(entity["text"])
  return result

After looping through the URL's, I can't seemed to store the result in a dictionary so that I can pass that to my function for entity results

# method II: loop through news api urls and perform entity analysis and store it in a dictionary
entitydict = {}
for url in url_to_analyze:
  entitydict.update(processurl(url))

Solution

  • I can't see where you are calling getentitylist, but in your url loop

    entitydict = {}
    for url in url_to_analyze:
      entitydict.update(processurl(url))
    

    update will be updating the dictionary based on key values. ie. this will overwrite the values for any keys already in the dictionary. As your response will look something like:

    {
      "usage": {
        "text_units": 1,
        "text_characters": 2708,
        "features": 1
      },
      "retrieved_url": "http://www.cnn.com/",
      "language": "en",
      "entities": [
        {
          "type": "Company",
          "text": "CNN",
          "sentiment": {
            "score": 0.0,
            "label": "neutral"
          },
          "relevance": 0.784947,
          "disambiguation": {
            "subtype": [
              "Broadcast",
              "AwardWinner",
              "RadioNetwork",
              "TVNetwork"
            ],
            "name": "CNN",
            "dbpedia_resource": "http://dbpedia.org/resource/CNN"
          },
          "count": 9
        }
      ]
    }
    
    

    The keys that will be updated are at the top level ie. usage, retrieved_url, retrieved_url, entities. So entitydict will only contain the response for the last url, as previous values for these keys will get overwritten.

    What you should be doing is use the url as key to each response.

    entitydict = {}
    for url in url_to_analyze:
      entitydict.update({url : processurl(url)})