mongodbindexingunique-constraintcase-insensitivecompound-index

How to Create a Case-Insensitive Unique Compound Index in MongoDB?


How to Create a Case-Insensitive Unique Compound Index in MongoDB?

Question Body: I'm trying to create a compound unique text index in MongoDB with the following fields:

tenant_id: 1 seller_id: 1 seller_sku_code: text However, I'm encountering an error when trying to create the index: Index build failed: 4df81a22-042f-4e51-bbec-b0f8a7dafe2e: Collection product_db.skus ( 83cc8978-b0fe-4c4b-ae58-691e32db7f95 ) :: caused by :: E11000 duplicate key error collection: product_db.skus index: tenant_id_1_seller_id_1_seller_sku_code_text dup key: { tenant_id: "1", seller_id: "113", _fts: "acc", _ftsx: 0.75 }

I need to ensure that the seller_sku_code is unique within the database and is case-insensitive. For example:

"ABC" and "abc" should not be allowed simultaneously. "ACC-2001" and "ACC-2000" should be allowed to coexist.

How can I create this unique, case-insensitive index on seller_sku_code while still ensuring the uniqueness of the tenant_id and seller_id fields?


Solution

  • Text index is probably not what you are looking for.

    "ACC-2001" and "ACC-2000" are strings, and contain some alphanumeric characters, but they are not text.

    To ensure that no two documents in the entire database have the same seller_sku_code, create a unique index on just that field, which a case-insensitive collation, like:

    db.collection.createIndex(
         {seller_sku_code:1},
         {
           unique:true,  
           collation:
             {
                locale : "en_US"
                strength : 1
             }
          }
    )
    

    Use the locale that is appropriate, and strength of 1 or 2 for case-insensitive.

    The problem you were encountering with the index build is the text index tokenizer sees the dash - as a separator, so instead of indexing the values "ACC-2001" it was indexing the values "ACC" and "2001". That would prevent any other document with the same tenant and seller_id from having a seller_sku_code starting with "ACC-".