jsonjq

Get unique objects and arrays based on key value


Is there a way to return a unique object/array when duplicates exist? Here's what I'm trying to do.

I have a payload like this:

{
  "data": [
    {
      "account": "12xUoMKwf12ABjNx4VCvYcNkX79gW1kzz2JnBLxkFbjswRczRvM",
      "amount": 7885016,
      "block": 470788,
      "gateway": "113kQU96zqePySTahB7PEde9ZpoWK76DYK1f57wyhjhXCBoAu88",
      "hash": "DTU1GGfR0eU15hv6KiV_bg6FOJXfUWz4TjIq1H7TGy4",
      "timestamp": "2020-08-28T01:29:46.000000Z"
    }
  ]
}
{
  "data": [
    {
      "account": "12xUoMKwf12ABjNx4VCvYcNkX79gW1kzz2JnBLxkFbjswRczRvM",
      "amount": 7885016,
      "block": 470788,
      "gateway": "113kQU96zqePySTahB7PEde9ZpoWK76DYK1f57wyhjhXCBoAu88",
      "hash": "DTU1GGfR0eU15hv6KiV_bg6FOJXfUWz4TjIq1H7TGy4",
      "timestamp": "2020-08-28T01:29:46.000000Z"
    }
  ]
}
{
  "data": [
    {
      "account": "12xUoMKwf12ABjNx4VCvYcNkX79gW1kzz2JnBLxkFbjswRczRvM",
      "amount": 8623955,
      "block": 470509,
      "gateway": "113kQU96zqePySTahB7PEde9ZpoWK76DYK1f57wyhjhXCBoAu88",
      "hash": "5fQJY9MprH9b3IstVU1SdfBteUWoF_sdsVuiARPBtTY",
      "timestamp": "2020-08-27T19:01:48.000000Z"
    }
  ]
}

As you can see, the first 2 payloads are identical and the last one is unique. I need to get the unique objects and then sum up the .amount when they fall below a certain time period. Here's what I have so far

jq --arg this "$(date +%Y-%m-%dT%H:%M:%S)" '.data[] | select(.timestamp >= $this) | .amount'

Which gives me the amounts so I can sum them up but, it also contains the duplicates. What I would like to do is get the objects that are unique by their .hash The idea is to sum up the total amounts that fall within the given date


Solution

  • What I would like to do is get the objects that are unique by their .hash

    One way to remove the duplicates would be to use unique_by/1 in conjunction with the -s command-line option.

    Assuming you want all the items in all the .data arrays you could start your pipeline with:

    jq -s 'map(.data[]) | unique_by(.hash) ...' 
    

    However, since you are really only interested in the .timestamp and .amount fields, it would be more efficient to proceed along the following lines:

    jq -s --arg this "$(date +%Y-%m-%dT%H:%M:%S)" '
      map(.data[] | select(.timestamp >= $this) | {hash, amount})
      | unique_by(.hash)[]
      | .amount
    ' input.json