Sorry for cross posting.The following question is also posted on Elastic Search's google group.
In short I am trying to find out why I am not able to get optimal performance while doing searches on a ES index which contains about 1.5 millon records.
Currently I am able to get about 500-1000 searches in 2 seconds. I would think that this should be orders of magnitudes faster. Also currently I am not using thrift.
Here is how I am checking the performance.
Using 0.19.1 version of pyes (tried both stable and dev version from github) Using 0.13.8 version of requests
conn = ES(['localhost:9201'],timeout=20,bulk_size=1000)
loop_start = time.clock()
q1 = TermQuery("tax_name","cellvibrio")
for x in xrange(1000000):
if x % 1000 == 0 and x > 0:
loop_check_point = time.clock()
print 'took %s secs to search %d records' % (loop_check_point-loop_start,x)
results = conn.search(query=q1)
if results:
for r in results:
pass
# print len(results)
else:
pass
Appreciate any help that you can give to help me scaleup the searches.
Thanks!
Isn't it just a matter of concurrency?
You're doing all your queries in sequence. So a query has to finish before the next one can come in to play. If you have a 1ms RTT to the server, this will limit you to 1000 requests per second.
Try to run a few instances of your script in parallel and see what kind of performance you got.