elasticsearchbulkinsertbulkupdateelasticsearch-bulk-api

How to handle multiple updates / deletes with Elasticsearch?


I need to update or delete several documents.

When I update I do this:

  1. I first search for the documents, setting a greater limit for the returned results (let’s say, size: 10000).
  2. For each of the returned documents, I modify certain values.
  3. I resent to elasticsearch the whole modified list (bulk index).

This operation takes place until point 1 no longer returns results.

When I delete I do this:

  1. I first search for the documents, setting a greater limit for the returned results (let’s say, size: 10000)
  2. I delete every found document sending to elasticsearch _id document (10000 requests)

This operation repeats until point 1 no longer returns results.

Is this the right way to make an update?

When I delete, is there a way I can send several ids to delete multiple documents at once?


Solution

  • For deletion and update, if you want to delete or update by id you can use the bulk api:

    Bulk API

    The bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed.

    The possible actions are index, create, delete and update. index and create expect a source on the next line, and have the same semantics as the op_type parameter to the standard index API (i.e. create will fail if a document with the same index and type exists already, whereas index will add or replace a document as necessary). delete does not expect a source on the following line, and has the same semantics as the standard delete API. update expects that the partial doc, upsert and script and its options are specified on the next line.

    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html

    You can also delete by query instead:

    Delete By Query API

    The delete by query API allows to delete documents from one or more indices and one or more types based on a query. The query can either be provided using a simple query string as a parameter, or using the Query DSL defined within the request body.

    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html