.netelasticsearchnestelasticsearch-net

NEST 7: How to get occurrence number for each document?


I'm using NEST 7.0 and C# to run search queries for Elasticsearch storage using Fluent DSL style. I use MultiMatch to search by passed string value in several number of fields:

queryContainer &= Query<Document>.MultiMatch(m => m.Fields(fields)
                                                   .Query(searchParams.SearchValue)
                                                   .Type(TextQueryType.MostFields));

For each document I receive it's _score and Source data. Both I can get from Response.Hits.

BUT how can I get the number of occurrence of the search value for the each document? I'd like to receive something like this:

Search value: "search"
Search fields: title, description
Results:
- Doc1: 5 occurrences
- Doc2: 0 occurrences
- Doc3: 3 occurrences
- Doc4: 1 occurrence
...

Thanks in advance for your help!


Solution

  • There is no direct way to do it in elastic search. The closest thing that can be done is to use multi-term vectors

    Query

    POST /index51/_mtermvectors
    {
        "ids" : ["1", "2"], --> Ids of all documents (_id)
        "parameters": {
            "fields": [
                "text"
            ],
            "term_statistics": true
        }
    }
    

    It will return list of all documents with statistics for each word in the field

    Result:

    {
      "docs" : [
        {
          "_index" : "index51",
          "_type" : "_doc",
          "_id" : "1",
          "_version" : 2,
          "found" : true,
          "took" : 3,
          "term_vectors" : {
            "text" : {
              "field_statistics" : {
                "sum_doc_freq" : 7,
                "doc_count" : 3,
                "sum_ttf" : 7
              },
              "terms" : {
                "another" : {
                  "doc_freq" : 2,
                  "ttf" : 2,
                  "term_freq" : 1,
                  "tokens" : [
                    {
                      "position" : 0,
                      "start_offset" : 0,
                      "end_offset" : 7
                    }
                  ]
                },
                "test" : {
                  "doc_freq" : 3,
                  "ttf" : 3,
                  "term_freq" : 1,
                  "tokens" : [
                    {
                      "position" : 2,
                      "start_offset" : 16,
                      "end_offset" : 20
                    }
                  ]
                },
                "twitter" : {
                  "doc_freq" : 2,
                  "ttf" : 2,
                  "term_freq" : 1,
                  "tokens" : [
                    {
                      "position" : 1,
                      "start_offset" : 8,
                      "end_offset" : 15
                    }
                  ]
                }
              }
            }
          }
        },
        {
          "_index" : "index51",
          "_type" : "_doc",
          "_id" : "2",
          "_version" : 1,
          "found" : true,
          "took" : 2,
          "term_vectors" : {
            "text" : {
              "field_statistics" : {
                "sum_doc_freq" : 7,
                "doc_count" : 3,
                "sum_ttf" : 7
              },
              "terms" : {
                "test" : {
                  "doc_freq" : 3,
                  "ttf" : 3,
                  "term_freq" : 1,
                  "tokens" : [
                    {
                      "position" : 0,
                      "start_offset" : 0,
                      "end_offset" : 4
                    }
                  ]
                }
              }
            }
          }
        }
      ]
    }
    

    Ids of all documents can be fetched using scroll api