elasticsearchelastic-stackelasticsearch-5elasticsearch-aggregationelasticsearch-dsl

Elasticsearch "ignore_above" issues



Index Mapping(In Kibana)

PUT /new_index
{
  "mappings": {
    "properties": {
      "items": {
        "type": "nested"
      },
      "contents": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 50000
          }
        }
      },
      "library_notes": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 50000
          }
        }
      },
      "image_path": {
        "type": "text",
        "fielddata": true
      }
    }
  }
}
[
  "library_language": {
    "type": "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      }
    }
  },
  "library_notes": {
    "type": "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 50000
      }
    }
  },
  "library_subject": {
    "type": "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      }
    }
  }
]
Sample library_notes data(is an array)
"library_notes": [
"\"The Big Picture Show, 14 September 2007 - 23 March 2008, Singapore Art Museum\"--T.p. verso.",
"Artists: Wong Shih Yaw; Charlie Co; Entang Wiharso; Syed Thajudeen; Zakaria Omar; Somboon Hormtientong; Lee Hsin Hsin; Dang Xuan Hoa; Lim Tze Peng; Hong Sek Chern; Ferdinand Montemayor; Antonio (Tony) Leano; Wong Keen; Tan Chin Kuan; Pratuang Emjaroen; Jeremy Ramsey; Gao Xingjian; Marc Leguay; He Kongde; Edgar (Egai) Talusan Fernandez; Pacita Abad; Imelda Cajipe-Endaya; Suos Sodavy; Tin Tun Hlaing; Bayu Utomo Radjikin.",
" In putting together The Big picture Show, the Singapore Art Museum (SAM) has taken the opportunity to bring together for display some of its largest treasures in its collection."
],
the query that i used
{
        "from": 0,
        "size": 10000,
        "track_total_hits": true,
        "sort": [
            {},
            {
                "_script": {
                    "type": "number",
                    "script": {
                        "lang": "painless",
                        "source": "doc.containsKey('image_path') && doc['image_path'].size() > 0 ? 0 : 1"
                    }
                }
            },
            "_score"
        ],
        "query": {
            "function_score": {
                "query": {
                    "bool": {
                        "must": [
                            {
                                "match_phrase": {
                                    "category_code": "ART"
                                }
                            },
                            {
                                "bool": {
                                    "should": [
                                        {
                                            "wildcard": {
                                                "library_notes.keyword": {
                                                    "value": "lim *ze peng",
                                                    "case_insensitive": true
                                                }
                                            }
                                        },
                                        {
                                            "wildcard": {
                                                "linking_notes.keyword": {
                                                    "value": "lim *ze peng",
                                                    "case_insensitive": true
                                                }
                                            }
                                        }
                                    ]
                                }
                            },
                            
                            {
                                "match": {
                                    "is_available": true
                                }
                            },
                        ],
                        "should": []
                    }
                },
                "functions": [
                    {
                        "random_score": {},
                        "weight": 1
                    }
                ],
                "score_mode": "sum"
            }
        }
    },
}

the wildcard search that "lim *tze peng" cannot be found, even though there is a Lim Tze Peng inside the library_notes field


Solution

  • The wildcard needs to match the entire token, in your case the token is the entire line:

    "Artists: Wong Shih Yaw; Charlie Co; Entang Wiharso; Syed Thajudeen; Zakaria Omar; Somboon Hormtientong; Lee Hsin Hsin; Dang Xuan Hoa; Lim Tze Peng; Hong Sek Chern; Ferdinand Montemayor; Antonio (Tony) Leano; Wong Keen; Tan Chin Kuan; Pratuang Emjaroen; Jeremy Ramsey; Gao Xingjian; Marc Leguay; He Kongde; Edgar (Egai) Talusan Fernandez; Pacita Abad; Imelda Cajipe-Endaya; Suos Sodavy; Tin Tun Hlaing; Bayu Utomo Radjikin.",

    So, to answer your specific question, you can find it by using a wildcard with leading and tailing wildcard operators *:

    "value": "*lim ?ze peng*",
    

    However, I cannot give you this advice without a huge disclamer. This is one of the slowest operations in elasticsearch, especially on the fields where you have several distinct values per record like in your case. There are much better alternatives for most use cases. So, I would strongly encourage you to consider providing users with other options to achieve their desired results.