I am using ElasticSearch to index some data. But I found that the performance is not that efficiency.
There are only 3000 entries data and each data has 6 columns. It costs 5 mins to index these 3000 entries.
Because I am new with ElasticSearch, my code and program flow are basic as following:
The code is following:
conn = pyes.ES('server:9200')
Search:
searchResult = conn.search(searchDict, indexName, TypeName)
Index
conn.index(storeDict, indexName, TypeName, id)
Update the Count in the index data.
conn.partial_update(indexName, TypeName, id, "ctx._source.Count += counter", params={"counter" : 1})
Is there any method that can improve the performance of my code ?
Thank you for your help.
You don't need to search before updating. Read the es docs on updating and scroll down to the upsert
section. upsert
is a parameter which holds a document to use if the document does not exist on the server, otherwise the upsert
is ignored and it works like a normal update
request (as you are doing now).
Good luck!