solrwatsonretrieve-and-rank

Watson Retrieve and Rank service returns no search results, and a Solr error when trying to train Ranker with custom features


so I have the following problem:

When I try to use the Ranker I have trained to search I get the following error message:

pysolr.SolrError: Solr responded with an error (HTTP 500): [Reason: Can not rerank results. Verify that your schema has not changed in incompatible ways.]

This is how I request the result:

pysolr_client._send_request("GET", path='/fcselect?q=%s&ranker_id=%s&wt=json' % (Question, ranker_id))

When I try to not do it through Python but through curl I get the following Error:

{"responseHeader":{"status":400,"QTime":1},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"Bad contentType for search handler :application/octet-stream request=...}","code":400}}

(I left out the request itself to not post the ranker id here).

The curl request I sent is the following:

curl -X POST -u "*username*":"*password*" "https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/*solr_cluster_id*/solr/Question_collection/fcselect?ranker_id=*ranker_id*&q=*question*?&wt=json"

I found the following solution to curl: just adding a -H "Content-Type: application/json" and, well, it shows me some documents as a result, but at the end of the response it still shows the same error. Additionally I see the following trace:

org.apache.solr.common.SolrException: Can not rerank results. Verify that your schema has not changed in incompatible ways.
at com.ibm.watson.hector.plugins.utils.ExceptionHandlingUtil.logAndThrowSolrException(ExceptionHandlingUtil.java:36)
at com.ibm.watson.hector.plugins.ss.FCFeatureGeneratorComponent.rerank(FCFeatureGeneratorComponent.java:743)
at com.ibm.watson.hector.plugins.ss.FCFeatureGeneratorComponent.process(FCFeatureGeneratorComponent.java:348)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2102)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

The problem is that between training the ranker and using it I haven't even touched anything else. Not the schema, not the collection, none of the names. And I only have one collection, one configuration, one of everything, except for the documents - 294 of them.

The whole process I went through worked for a Ranker without custom features. But with custom features it doesn't.

I have gone through this tutorial to create my Watson Ranker with custom features: https://medium.com/machine-learning-with-ibm-watson/developing-with-ibm-watson-retrieve-and-rank-part-3-custom-features-826fe88a5c63

As far as I understand all I did thanks to this tutorial was changing the trainingdata.txt file, the training process is the same.

And now I have run out of ideas what to check to solve the problem..

Do you have any suggestions?

Thanks a lot in advance! :)


Solution

  • It was a stupid thing in the server.py of the rr_custom_scorer_proxy. When it writes the 'answer' csv that has to be reranked by the ranker it opens the file in 'wt' mode which leads to blank lines in between each line. This can't be handled by the ranker and we get an error.

    If you change it to 'wb' verything works well.