I'm writing the csv file to train a ranker in Watson Retrieve and Rank service, with many rows [query,"id_doc","relevance_score",...].
I have two questions about the structure of this file:
Thus, if the query is like "I'm a manager. How do I....?" then the first document is correct, but not the second one.
if the query is like "I'm not a manager..." then the second document is correct, but not the first one.
Is there any particular syntax that can be used to write the query in a proper way? Maybe using boolean operator? Is this file the right place to apply this kind of filter?
2. This service has also a web interface to train a ranker. The rating used in this site is: 1-> incorrect answer, 2-> relevant to the topic but doesn't answer to the question, 3-> good, but can be improved, 4->perfect answer.
Is the relevance score used in this file the same one of the web interface?
Thank you!
Is there any particular syntax that can be used to write the query in a proper way? Maybe using boolean operator? Is this file the right place to apply this kind of filter?
As you hinted, this file is not quite the appropriate place for using filters. The training data will be used to figure out what types of lexical overlap features the ranker should pay attention to when trying to optimize the ordering of the search results from Solr (see discussion here for more information: watson retrieve-and-rank - manual ranking).
That said, you can certainly add at least two rows to your training data like so:
With a sufficient number of such examples, hopefully the ranker will learn to pay attention to bigram lexical overlap features. If this is not working, you can certainly play with pre-detecting manager vs not manager and apply appropriate filters, but I believe that's done with a separate parameter (fq
?)...so you might have to modify train.py
to pass the filter query appropriately (the default train.py
takes the full query and passes it via the q
to the /fcselect
endpoint).
Is the relevance score used in this file the same one of the web interface?
Not quite, the web interface uses the 1-4 star rating to improve the UI for data collection, but then compresses the star ratings to a smaller relevance label scale when generating the training data for the ranker. I think the compression gives bad answers (i.e. star ratings < 3) a relevance label of 0
and passes the higher star ratings as is so that effectively there are 3 levels of rating (though maybe someone on the UI team can add clarification on the details if need be). It is important for the underlying ranking algorithm that bad answers receive a relevance label of 0
.