I try to use Percolator
by Elasticsearch
and I have a minor issue.
Suppose our document looks like this:
{
"doc": {
"full_name": "Pacman"
"company": "Arcade Game LTD",
"occupation": "hunter",
"tags": ["Computer Games"]
}
}
And our registered query like this:
{
"query": {
"bool": {
"must": [
{
"match_phrase":{
"occupation": "hunter"
}
},
{
"terms": {
"tags": [
"Computer Games",
"Electronic Sports"
],
"minimum_match": 1
}
}
]
}
}
}
I get:
{
"took": 3,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 0,
"matches": []
}
and I don't know what I'm doing wrong, because if I remove terms
from registered query and just match by occupation
it works as expected and I get one match.
Any hints?
Update 1
OK, I think that @Slam's solution is the right direction, but I still have some issues:
I updated my mapping for tags, so it now looks like this:
"tags": {
"store": True,
"analyzer": "snowball",
"type": "string",
"index": "analyzed",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
New document to percolate:
{
"doc": {
"full_name": "Pacman"
"company": "Arcade Game LTD",
"occupation": "hunter",
"tags.raw": ["Computer Games"]
}
}
And when I try to match document above with tags.raw
, still no matches are found.
I analyzed field tags.raw
but it looks like it still creates tokens computer
, games
and running
.
I guess, you use implicit mapping (default analyzer) or any type of analyzer for your tags
field. That means, that data ("Computer Games" in your case) is broken to token parts and no longer available for terms search, as now its represented as something like computer+game
in index.
To be able to do term matching for strings, you need either map them as non-analyzed (to prevent them to be sliced to tokens) like
PUT so/pacman/_mapping
{
"pacman": {
"properties": {
"tags": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
or make your tags
field multi-field, like
PUT so/pacman/_mapping
{
"pacman": {
"properties": {
"tags": {
"type": "string",
"index": "analyzed",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
and query documents with
GET so/pacman/_search
{
"query": {
"terms": {
"tags.raw": [
"Computer Games",
"Running"
],
"minimum_match": 1
}
}
}
Such approach let you perform text search and term searches.
According to your Update 1, after you've put the correct mapping and percolator like:
PUT so/.percolator/1
{
"query": {
"terms": {
"tags.raw": [
"Computer Games",
"Maze running"
]
}
}
}
you need to index/percolate documents with format like
GET so/pacman/_percolate
{
"doc": {
"full_name": "Pacman",
"company": "Arcade Game LTD",
"occupation": "hunter",
"tags": ["Computer Games"]
}
}
What is happening here. You're indexing/percolation document with field tags
(without any mention of raw
or whatever multifield you have). ES take this field from json, adds tags.raw
to index (as whole string), and at the same time brake it down to analyzed tokens, and put them in tag
field (the process is much more complicated, but lets pass it for the sake of simplicity here). So, you don't need to manage any internal things about this field, you've done that in your mapping.
And when percolator works, it will look for tags.raw
field in index (because you created terms query for this "subfield") leaving the analyzed one untouched.