I am querying ElasticSearch and sorting the documents locally in Bash with jq
, as sorting in ES is too slow for me.
The original purpose is to create a CSV file.
But I find the sorting does not work properly, it seems sort
step does nothing.
As I am launching cURL
requests, I thought the wrong order is due to content is chunked so I save some results into a local test.json
file and tried again, but it still does not work.
test.json
:
{
"took": 680,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"max_score": 1.0,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "111111113584925",
"_score": 1.0,
"fields": {
"field2": [
"FOO"
],
"field1": [
"111111113584925"
]
}
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "111111121254059",
"_score": 1.0,
"fields": {
"field2": [
"FOO"
],
"field1": [
"111111121254059"
]
}
}
]
}
}
(There are many more records - edited for brevity.)
Command that I use:
jq '.hits.hits[].fields | [.field1[0] + "," + .field2[0]] | sort | .[0]' -r test.json
The result:
111111113584925,FOO
111111121254059,FOO
111111116879444,FOO
etc.
Why?
Should I rely on jq
sorting? Am I using it correctly? I mean I want to do string comparison by alphabetical order, and field1
all have unique values, so it will never be a tie and start to compare values of field2
(it also could have various values but I only want to sort by field1
)
Should I use Bash sort -k 1
instead? Which is faster when it comes to 100K rows?
You're looking for something like this:
.hits.hits | map(.fields | .field1[0] + "," + .field2[0]) | sort[]