elasticsearchpoint-in-timenes

Fetch data from a middle of a big stack using searchAfter(jump to a specific page,)


I have a large data set around 25million records I am using searchAfter with PointInTime to walk through the data My question is there a way where I can skip records over the limit of 10000

index.max_result_window

and start picking the records for example from 100,000 up to 105,000

right now I am sending multiple requests to Elasticsearch until I reach the desired point but it is not efficient and it is consuming a lot of time

Here is how I did it : I calculated how many pages I needed to do the pagination.
Then the user will send a request with page number i.e number 3. So in this case only when I reach the desired page I will set the source to true. this I best I managed to do to improve the performance and reduce the response size for none required pages

 int numberOfPages =  Pagination.GetTotalPages(totalCount, _size);

 var pitResponse = await _esClient.OpenPointInTimeAsync(content._index, p => p.KeepAlive("2m"));

            if (pitResponse.IsValid)
            {
                IEnumerable<object> lastHit = null;

                    for (int round = 0; round < numberOfPages; round++)
                    {
                        bool fetchSource = round == requiredPage;
                        var response = await _esClient.SearchAsync<ProductionDataItem>(s => s
                            .Index(content._index)
                            .Size(10000)
                            .Source(fetchSource)
                            .Query(query)
                            .PointInTime(pitResponse.Id)
                            .Sort(srt => {
                                if (content.Sort == 1) { srt.Ascending(sortBy); }
                                else { srt.Descending(sortBy); }
                                return srt; })
                            .SearchAfter(lastHit)
                        );

                        if (fetchSource)
                        {
                           itemsList.AddRange(response.Documents.ToList());
                            break;
                        }
                        lastHit = response.Hits.Last().Sorts;
                    }
                }
                //Closing PIT
                await _esClient.ClosePointInTimeAsync(p => p.Id(pitResponse.Id));

Solution

  • I think the best way to do it

    by keeping scrolling via Point in time and only loading the result when the desired page is reached by using the .source(bool)