opensearchamazon-opensearch

Disable automatic ID generation on OpenSearch


When indexing a document OpenSearch has the option to perform a POST to the index/_doc API for creating it, automatically generating an document ID if not informed.

Is there a configuration option to disable it, making so the only option is to explicitly inform a document ID when creating?


Solution

  • OpenSearch does not provide a built-in configuration option to enforce the explicit specification of document IDs when indexing documents. By default, when you use the POST /index/_doc API without providing an ID, OpenSearch automatically generates one for you.

    In Elasticsearch or Opensearch you can use an ingest pipeline to prevent document creation if the _id field is not exist.

    PUT _ingest/pipeline/require_id_pipeline
    {
      "description": "Ensure document has an _id field, reject indexing if not",
      "processors": [
        {
          "script": {
            "source": """
              if (ctx._id == null || ctx._id == '') {
                throw new IllegalArgumentException('Document must have an _id');
              }
            """
          }
        }
      ]
    }
    

    POST _bulk?pipeline=require_id_pipeline
    { "index": { "_index": "test_exception", "_id": 1 } }
    { "title": "Rush", "year": 2013 }
    { "index": { "_index": "test_exception", "_id": "2" } }
    { "title": "Prisoners", "year": 2013 }
    { "index": { "_index": "test_exception" } }
    { "doc" : { "title": "World War Z" } }
    

    #add the following to update the default pipeline. In that way, you don't need to provide ?pipeline in each POST request.
    PUT test_exception/_settings
    {"default_pipeline":"require_id_pipeline"}
    

    ingest pipeline test screenshot