pythonjsonelasticsearchelasticsearch-bulk-api

how to do bulk indexing to elasticsearch from python


I have nearly 10K json documents and i want to push all this documents to elasticsearch by using elasticsearch bulk api from python. I went through some docs but didn't get any solutions.

result=es.bulk(index="index1", doc_type="index123", body=jsonvalue)
helpers.bulk(es,doc) 

i tried both but no result,i am getting this error

elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]')

please help me


Solution

  • I prefer using the bulk method present in helpers module for bulk indexing. Try the following:

    from elasticsearch import helpers
    res = helpers.bulk(es, jsonvalue, chunk_size=1000, request_timeout=200)
    

    Your jsonvalue needs to follow a particular format. It needs to be a list of the 10K json documents with each document having the following fields:

    doc = {
        '_index': 'your-index',
        '_type': 'your-type',
        '_id': 'your-id',
        'field_1': 'value_1',
        ...
    }
    

    So your final jsonvalue would look something like this:

    jsonvalue = [
        {
        '_index': 'your-index',
        '_type': 'your-type',
        '_id': 'your-id',
        'field_1': 'value_1',
        ...
    },
        {
        '_index': 'your-index',
        '_type': 'your-type',
        '_id': 'your-id',
        'field_1': 'value_2',
        ...
    },
        {
        '_index': 'your-index',
        '_type': 'your-type',
        '_id': 'your-id',
        'field_1': 'value_3',
        ...
    }
    ]