pythonelasticsearchpyes

elastic search performance using pyes


Sorry for cross posting.The following question is also posted on Elastic Search's google group.

In short I am trying to find out why I am not able to get optimal performance while doing searches on a ES index which contains about 1.5 millon records.

Currently I am able to get about 500-1000 searches in 2 seconds. I would think that this should be orders of magnitudes faster. Also currently I am not using thrift.

Here is how I am checking the performance.

Using 0.19.1 version of pyes (tried both stable and dev version from github) Using 0.13.8 version of requests

conn = ES(['localhost:9201'],timeout=20,bulk_size=1000)
loop_start = time.clock()
q1 = TermQuery("tax_name","cellvibrio")
for x in xrange(1000000):
    if x % 1000 == 0 and x > 0:
        loop_check_point = time.clock()
        print 'took %s secs to search %d records' % (loop_check_point-loop_start,x)

    results = conn.search(query=q1)
    if results:
        for r in results:
            pass
#            print len(results)
    else:
        pass

Appreciate any help that you can give to help me scaleup the searches.

Thanks!


Solution

  • Isn't it just a matter of concurrency?

    You're doing all your queries in sequence. So a query has to finish before the next one can come in to play. If you have a 1ms RTT to the server, this will limit you to 1000 requests per second.

    Try to run a few instances of your script in parallel and see what kind of performance you got.