I have used IBM Watson Retrieve and Rank Web Interface to create a collection of html articles. Via the web interface I was able to upload my html articles. The problem is when I query the collection the data for id and title are not usable. Here is the query I made in the browser:
https://MY-USER-NAME:MY-PASSWORD@gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/MY-CLUSTER/solr/MY-COLLECTION/select?q=what is the basic mechanism of the transonic aileron buzz&wt=json&fl=id,title
The response I get is:
{"responseHeader":{"status":0,"QTime":106,"params":{"q":"what is the basic mechanism of the transonic aileron buzz","fl":"id,title","wt":"json"}},"response":{"numFound":12,"start":0,"docs":[{"id":"6a06f47c-cb3f-4791-9914-c84772eb9415","title":"no-title"}.....
The bold section is the problem. When using the web interface is there a way to set the title and id when uploading documents? Or, better yet, is there another way I query my collection to get the file name of the document I uploaded and/or the text from the document?
When using the web interface is there a way to set the title and id when uploading documents?
No, sorry.
However, if you upload the documents yourself from outside of the web interface, you can specify the title and ID (and the documents will be shown in the web interface when you come back to it).
is there another way I query my collection to get the file name of the document I uploaded
Yes
In the query you posted above, the last parameters you have are the fields you want to retrieve
&fl=id,title
You're retrieving the ID and the title.
If you want the name of the file that the content came from, add fileName
. For example:
https://MY-USER-NAME:MY-PASSWORD@gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/MY-CLUSTER/solr/MY-COLLECTION/select?q=what is the basic mechanism of the transonic aileron buzz&wt=json&fl=id,title,fileName
is there another way I query my collection to get text from the document
Yes.
Similar to above, you just need to update the list of fields that you retrieve. The contents of the doc is put in a field called body
.
So to get the ID, title, and the body, you could use:
https://MY-USER-NAME:MY-PASSWORD@gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/MY-CLUSTER/solr/MY-COLLECTION/select?q=what is the basic mechanism of the transonic aileron buzz&wt=json&fl=id,title,body
That gets you a plain text version of the contents. If you want the HTML, use contentHtml
instead.
https://MY-USER-NAME:MY-PASSWORD@gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/MY-CLUSTER/solr/MY-COLLECTION/select?q=what is the basic mechanism of the transonic aileron buzz&wt=json&fl=id,title,contentHtml