pythonelasticsearchelasticsearch-bulk-api

getting [ERROR] 'The [dims] property must be specified for field [vector].' despite being set in mapping


I am trying to upload dense vectors to Elasticsearch endpoint.

  1. created index with mapping as below:
        mapping = {
        "mappings": {
            "properties" : {
                "vector": {
                    "type": "dense_vector",
                    "dims": 300  
                    },
                "word" : {
                    "type" : "text"
                    }
                }
            }
        }  

        es.indices.create(
        index="test",
        body=mapping
    ) 

response received:

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'test'}
  1. created a function to upload vector using bulk api as below: Model is a python dictionary with word as key and associated vector as value.
def gendata(model):
  for key, value in model.items():
    key = str(key)
    yield {
        "_index": "test",
        "_id": key,
        "_type": "document",
        "word": key,
        "vector": value
    }

getting below error while calling function gendata() using helpers.bulk()

NOTE: dims are set in mapping then why it is giving error that dims must be specified.

BulkIndexError: ('100 document(s) failed to index.', [{'index': {'_index': 'test', '_type': 'document', '_id': 'the', 'status': 400, 'error': {'type': 'mapper_parsing_exception', 'reason': 'The [dims] property must be specified for field [vector].'}, 'data': {'word': 'the', 'vector': [0.04656, 0.21318, -0.0074364, -0.45854, -0.035639, 0.23643, -0.28836, 0.21521, -0.13486, -1.6413, -0.26091, 0.032434, 0.056621, -0.043296, -0.021672, 0.22476, -0.075129, -0.067018, -0.14247, 0.038825, -0.18951, 0.29977, 0.39305, 0.17887, -0.17343, -0.21178, 0.23617, -0.063681, -0.42318, -0.11661, 0.093754, 0.17296, -0.33073, 0.49112, -0.68995, -0.092462, 0.24742, -0.17991, 0.097908, 0.083118, 0.15299, -0.27276, -0.038934, 0.54453, 0.53737, 0.29105, -0.0073514, 0.04788, -0.4076, -0.026759, 0.17919, 0.010977,

Solution

  • Elasticsearch doesn't read your mapping configuration during query time because you are using ES 7.x that doesn't support anymore doc_type -doc here - but you are specifying this param in bulk query. According to the mapping variable when you create your index you don't specify your doc_type, but when you perform your bulk request you did it with a non existing doc_type - document.

    So please change:

    def gendata(model):
      for key, value in model.items():
        key = str(key)
        yield {
            "_index": "test",
            "_id": key,
            "_type": "document",
            "word": key,
            "vector": value
        }
    

    in:

    def gendata(model):
      for key, value in model.items():
        key = str(key)
        yield {
            "_index": "test",
            "_id": key,
            "word": key,
            "vector": value
        }