ibm-watsonwatson-conversationwatson-discovery

Get document name of IBM Discovery


I have integrated Watson Discovery and Watson Assistant, so user can query on the Discovery documents from Assistant. As of now I am displaying the passage which have highest passage_score. Now i want to display the document name from which document the passage is being fetched. Below is my Node.js code.

function main(params) {
const DiscoveryV1 = require('watson-developer-cloud/discovery/v1');
  return new Promise(function (resolve, reject) {

var discovery = new DiscoveryV1({
        url: 'https://gateway-lon.watsonplatform.net/discovery/api',
        iam_apikey:'vvvvvvvvvv',  /* watson discovery api key */
        version: '2018-12-03'
      });

      discovery.query(
        {environment_id: 'vvvvvv', /* watson discovery environment id */
        collection_id: 'vvvvvvvvvv', /* watson discovery collection id */
        natural_language_query: params.message, 
        passages: 'true'
      }, function(err, data) {
        if (err) {
          return reject(err);
        }

       return resolve(data.passages[1]);
      }); 
  });
}

Can anybody suggest the modifications to display the document name?


Solution

  • @msr_003 There are two ways you can handle this. The document_id of the document where the passage was extracted is returned in the passage response. The document_id field of the passage response maps to the id field of the documents returned in the query response. You can look up the extracted_metadata.filename field of the document response where document_id(of the passage) = id(of the document where it came from). It is admittedly confusing that the same field is referred to two different ways.

    Also, note that the number of documents returned in your query response will affect whether the document that the passage came from is actually returned or not. For ex, say you return 5 passages, and choose to return 5 documents. It's very possible that one or more of the passages returned came from documents that aren't in the top 5 documents that are returned. If this happens, you can return a larger number of documents in your query response to help prevent this from happening. So for example, return the top 100 documents when you return 5 passages to help prevent this from occurring.

    The other thing you can do is use the document details API https://cloud.ibm.com/apidocs/discovery#get-document-details to get the details of the document where the passage came from. While this takes a second API call and will be slower, it eliminates the chance that the document isn't returned in the original query result.