jsonelasticsearchfile-import

ElasticSearch JSON file import (Bulk API)


I saw a few similar posts to this here on StackOverflow, but I still don't have a clear understanding of how to index a large file with JSON documents into ElasticSearch; I'm getting errors like the following:

{"error":"ActionRequestValidationException[Validation Failed: 1: index is missing;2: type is missing;]","status":400}

{"took":231,"errors":false,"items":[{"index":{"_index":"test","_type":"type1","_id":"1","_version":7,"status":200}}]

I have a JSON file that is about 2Gb in size, which is the file I actually want to import. But first, in order to understand how the Bulk API works, I created a small file with just a single line of actual data:

testfile.json

{"index":{"_id":"someId"}} \n
{"id":"testing"}\n

I got this from another post on SO. I understand that the first line is a header, and I also understand that the "index" in the first line is the command which is going to be sent to ES; however, this still does not work. Can someone please give me a working example and clear explanation of how to import a JSON file into ES?

Thank you!


Solution

  • The following samples comes from the elasticsearch documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html?q=bulk

    { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
    { "field1" : "value1" }
    { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
    { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
    { "field1" : "value3" }
    { "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
    { "doc" : {"field2" : "value2"} }
    

    So line one tells elastic to index the document on line two into index test, type type1 with _id 1. It will index the document with field1. You could change the url if they all go to the same index and type. Check the link for samples.

    In line three you see an example of a delete action, this document does not need a document in line four.

    Be careful with very large documents, 2 Gb is probably to big. It needs to be send to elastic first, which loads it into memory. So there is a limit to the amount of records to send.