jsonelasticsearchcurlelasticsearch-bulk-api

Upload new line JSON to Elasticsearch bulk API


I'm trying to upload a new line JSON to Elasticsearch using the Bulk API. The bulk JSON I'm uploading looks like this, with each JSON on a new line:

{"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": false, "first_seen": "2020-03-31", "last_seen": "2020-04-15", "actor": "unknown", "tags": ["ADB Worm", "HTTP Alt Scanner", "Mirai", "Web Scanner"], "cve": [], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "", "asn": "ASxxx", "tor": false, "os": "Linux 2.2-3.x", "category": "isp"}, "raw_data": {"scan": [{"port": 80, "protocol": "TCP"}, {"port": 81, "protocol": "TCP"}, {"port": 88, "protocol": "TCP"}, {"port": 5555, "protocol": "TCP"}, {"port": 8080, "protocol": "TCP"}], "web": {}, "ja3": []}}
{"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": true, "first_seen": "2020-04-09", "last_seen": "2020-04-11", "actor": "unknown", "tags": ["Eternalblue", "SMB Scanner"], "cve": ["CVE-2017-0144"], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "host.somehost.com", "asn": "ASxxx", "tor": false, "os": "Windows 7/8", "category": "isp"}, "raw_data": {"scan": [{"port": 445, "protocol": "TCP"}], "web": {}, "ja3": []}}
{"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": true, "first_seen": "2019-09-05", "last_seen": "2020-04-06", "actor": "unknown", "tags": ["Mirai"], "cve": [], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "redacted", "asn": "ASxxx", "tor": false, "os": "Linux 2.2.x-3.x (Embedded)", "category": "isp"}, "raw_data": {"scan": [{"port": 23, "protocol": "TCP"}, {"port": 2323, "protocol": "TCP"}], "web": {}, "ja3": []}}

There's no index or key at the head of the JSON. So of course when I try to upload it with this command (my_index is a blank index with no mapping).

curl -s -H 'Content-Type: application/x-ndjson' -X POST http://localhost:9200/my_index/_bulk --data-binary @my_newline_json.json

I get the error message:

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\\n]"}],"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\\n]"},"status":400}

So if I understand the problem correctly as per the docs, the issue is that the error is because there's no index or type specified at the start of the JSON. My problem is that I don't understand how to add the necessary index and type so that the JSON can be read.

I'm using Curl to create and add data to my index so what would the best way be format a curl command to create the index properly and allow my JSON to be uploaded?

(I have previously used the excellent Elasticsearch_loader tool by MosheZada which lets you specify the index and type in the command. This worked well but I'm trying to understand what is happening in that command and how I could do the same thing with Curl if needed.)


Solution

  • curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/index-name/doc-type/_bulk?pretty' --data-binary @my_newline_json.json
    

    Change your bulk JSON, to the following format. Your my_newline_json.json should look like this:

    {"index":{}}
    {"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": false, "first_seen": "2020-03-31", "last_seen": "2020-04-15", "actor": "unknown", "tags": ["ADB Worm", "HTTP Alt Scanner", "Mirai", "Web Scanner"], "cve": [], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "", "asn": "ASxxx", "tor": false, "os": "Linux 2.2-3.x", "category": "isp"}, "raw_data": {"scan": [{"port": 80, "protocol": "TCP"}, {"port": 81, "protocol": "TCP"}, {"port": 88, "protocol": "TCP"}, {"port": 5555, "protocol": "TCP"}, {"port": 8080, "protocol": "TCP"}], "web": {}, "ja3": []}}
    {"index":{}}
    {"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": true, "first_seen": "2020-04-09", "last_seen": "2020-04-11", "actor": "unknown", "tags": ["Eternalblue", "SMB Scanner"], "cve": ["CVE-2017-0144"], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "host.somehost.com", "asn": "ASxxx", "tor": false, "os": "Windows 7/8", "category": "isp"}, "raw_data": {"scan": [{"port": 445, "protocol": "TCP"}], "web": {}, "ja3": []}}
    {"index":{}}
    {"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": true, "first_seen": "2019-09-05", "last_seen": "2020-04-06", "actor": "unknown", "tags": ["Mirai"], "cve": [], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "redacted", "asn": "ASxxx", "tor": false, "os": "Linux 2.2.x-3.x (Embedded)", "category": "isp"}, "raw_data": {"scan": [{"port": 23, "protocol": "TCP"}, {"port": 2323, "protocol": "TCP"}], "web": {}, "ja3": []}}
    

    Dont forget to add a new line at the end of your content.

    Format of bulk JSON:

    enter image description here

    Output Result:

    enter image description here