perlelasticsearchmetacpan

How can I do a scrolled search on MetaCPAN?


I'm trying to convert this script to use the new Elasticsearch official client instead of the older (now deprecated) ElasticSearch.pm, but I can't get the scrolled search to work. Here's what I've got:

#! /usr/bin/perl

use strict;
use warnings;
use 5.010;

use Elasticsearch ();
use Elasticsearch::Scroll ();

my $es = Elasticsearch->new(
  nodes => 'http://api.metacpan.org:80',
  cxn   => 'NetCurl',
  cxn_pool => 'Static::NoPing',
  #log_to   => 'Stderr',
  #trace_to => 'Stderr',
);

say 'Getting all results at once works:';
my $results = $es->search(
  index => 'v0',
  type  => 'release',
  body  => {
    filter => { range => { date => { gte => '2013-11-28T00:00:00.000Z' } } },
    fields => [qw(author archive date)],
  },
);

foreach my $hit (@{ $results->{hits}{hits} }) {
  my $field = $hit->{fields};
  say "@$field{qw(date author archive)}";
}

say "\nUsing a scrolled search does not work:";
my $scroller = Elasticsearch::Scroll->new(
  es          => $es,
  index       => 'v0',
  search_type => 'scan',
  size        => 100,
  type        => 'release',
  body => {
    filter => { range => { date => { gte => '2013-11-28T00:00:00.000Z' } } },
    fields => [qw(author archive date)],
  },
);

while (my $hit = $scroller->next) {
  my $field = $hit->{fields};
  say "@$field{qw(date author archive)}";
} # end while $hit

The first search, where I'm just getting all the results in 1 chunk, works fine. But the second search, where I'm trying to scroll through the results, produces:

Using a scrolled search does not work:
[Request] ** [http://api.metacpan.org:80]-[500]
ActionRequestValidationException[Validation Failed: 1: scrollId is missing;],
called from sub Elasticsearch::Transport::try {...}
at .../Try/Tiny.pm line 83. With vars: {'body' =>
'ActionRequestValidationException[Validation Failed: 1: scrollId is missing;]',
'request' => {'path' => '/_search/scroll','serialize' => 'std',
'body' => 'c2Nhbjs1OzE3MjU0NjM2MjowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2NDowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MTowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MDowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MzowakFELUU3VFFibTJIZW1ibUo0SUdROzE7dG90YWxfaGl0czoxNDQ7',
'method' => 'GET','qs' => {'scroll' => '1m'},'ignore' => [],
'mime_type' => 'application/json'},'status_code' => 500}

What am I doing wrong? I'm using Elasticsearch 0.75 and Elasticsearch-Cxn-NetCurl 0.02, and Perl 5.18.1.


Solution

  • I finally got it working with the newer Search::Elasticsearch official client. Here's the short version:

    #! /usr/bin/perl
    
    use strict;
    use warnings;
    use 5.010;
    
    use Search::Elasticsearch ();
    
    my $es = Search::Elasticsearch->new(
      cxn_pool => 'Static::NoPing',
      nodes    => 'api.metacpan.org:80',
    );
    
    my $scroller = $es->scroll_helper(
      index       => 'v0',
      type        => 'release',
      search_type => 'scan',
      scroll      => '2m',
      size        => 100,
      body        => {
        fields => [qw(author archive date)],
        query  => { range => { date => { gte => '2015-02-01T00:00:00.000Z' } } },
      },
    );
    
    while (my $hit = $scroller->next) {
      my $field = $hit->{fields};
      say "@$field{qw(date author archive)}";
    } # end while $hit
    

    Note that the records are not sorted when you do a scrolled search. I wound up dumping the records into a temporary database and sorting them locally. The updated script is on GitHub.