phpsearchelasticsearchfull-text-searchquick-search

Elasticsearch. How to realize the following principles combining it with quick search?


My mapping is:

"current_name" => [
    "type"     => "string",
    "index"    => "analyzed",
    "analyzer" => "russian",
    "fields"   => [
        "raw"           => [
            "type"  => "string",
            "index" => "not_analyzed"
        ],
        "raw_lowercase" => [
            "type"     => "string",
            "analyzer" => "tolowercase"
        ]
    ]
],

I need to search the field using the following examples of principles (all together):

  1. Indexed string - "monkeys". I need to find this document by "monkey".

  2. Indexed string - "hello my beautiful world". I need to have possibility to find this document by "hello big world".

  3. Indexed string - "appropriate". I need to have possibility to find this document by "apropriat".

Overall: Indexed - "the Earth planet is the most beautiful in our Solar system". I want to find this document by "earth is beautifal".

All those principles should be applied while user type in his query - quick search. Language is Russian.

Optional: 1) Indexed - "great job". I want to find the document by synonim word "good". 2) Indexed - "beautiful world" find by "beaut worl"

How can I realize described? What are your remarks about combining those principles with quick search?


Solution

  • Autosuggest considerations

    Strategies to accomplish what you're asking

    1) Indexed string - "monkeys". I need to find this document by "monkey".

    This is an example of stemming or reducing common inflections of a term to a root form.

    For example, mapping inputs of "fitted", "fitting", "fits", "fit" all to a common form, "fit".

    Stemming has to occur both for indexed terms and for query terms, so that searches for any of the inflections will yield results containing any other inflections.

    Within the Elasticsearch distribution are included two Russian stemmers, russian and light_russian, listed here (follow links to implementation descriptions).

    Any of the suggester implementations can be parameterized with a custom analyzer. By default, they use the analyzer defined in the mapping for the field being suggested.

    2) Indexed string - "hello my beautiful world". I need to have possibility to find this document by "hello big world"

    One solution is simply a boolean search: hello OR my OR beautiful OR world. The implementation of the Elasticsearch match query defaults to boolean and would do what you describe given the phrase "hello my beautiful world" (assuming "hello" and "world" are tokens generated by the searched field's analyzer)

    Another solution try would be using the phrase suggester to piece-together the matching terms in the query. (with max_errors >= 0.5 so that terms my beautiful could be considered misspellings.)

    3) Indexed string - "appropriate". I need to have possibility to find this document by "apropriat".

    You're describing a fuzzy search. This search provides 1-2 characters of leniency in the spelling of a term, and would certainly help chronic misspellers, and poor typists.

    Both the completion suggester (which only needs a word prefix to provide suggestions), and the term suggester (which only suggests based on entire terms being entered) have the ability to specify fuzziness or leniency in the "edit distance" between the query and the field value.

    Overall: Indexed - "the Earth planet is the most beautiful in our Solar system". I want to find this document by "earth is beautifal".

    Optional: 1) Indexed - "great job". I want to find the document by synonim word "good". 2) Indexed - "beautiful world" find by "beaut worl"

    (Overall) The phrase suggester may not be able to suggest "the Earth planet is the most beautiful in our Solar system" given the typed phrase "earth is beautifal". This is because there are a number of unrelated terms seperating "earth" and "beautiful" in the source document. A phrase search, with slop set to allow, say a gap of four terms (as in the example), would satisfy this solution. But you'd have to execute a (slower) search request inside your completion logic.

    (Optional 1) Synonyms are discussed here, and can be included in your analyzer. Though, I would split-test this thoroughly, as searchers may not expect to see synonyms in their suggestions.

    (Optional 1) I doubt the completion suggester will complete multiple terms like "beaut worl" you may have to use edge-ngrams. Practically speaking, however, I doubt anyone will ever type this, even accidentally.


    Multiple suggester types can be requested within a _suggest call. You may end up running with a combination of completion and phrase suggesters to cover all of your bases.