jsonjq

Remove duplicates from JSON arrays


I want to remove the duplicates from each array in this JSON:

{
  "abc": [
    "five"
  ],
  "pqr": [
    "one",
    "one",
    "two",
    "two",
    "three",
    "three",
    "four",
    "four"
  ],
  "xyz": [
    "one",
    "one",
    "two",
    "two",
    "four"
  ]
}

Output I am expecting after removing the duplicates:

{
  "abc": [
    "five"
  ],
  "pqr": [
    "one",
    "two",
    "three",
    "four"
  ],
  "xyz": [
    "one",
    "two",
    "four"
  ]
}

I tried map, uniq, group_by with jq but nothing helped


Solution

  • unique can remove duplicates, but it automatically sorts the arrays, which may or may not be what you want.

    jq '.[] |= unique'
    
    {
      "abc": [
        "five"
      ],
      "pqr": [
        "four",
        "one",
        "three",
        "two"
      ],
      "xyz": [
        "four",
        "one",
        "two"
      ]
    }
    

    Demo

    You can retrieve the original ordering by recreating the array based on sort ing the index positions of all of its unique items:

    jq '.[] |= [.[[index(unique[])] | sort[]]]'
    

    Demo

    Or circumvent any sorting behaviour by writing your own straightforward de-duplication function:

    jq '.[] |= reduce .[] as $i ([]; . + if index($i) then [] else [$i] end)'
    

    Demo

    In my tests, the latter performed best, with both producing

    {
      "abc": [
        "five"
      ],
      "pqr": [
        "one",
        "two",
        "three",
        "four"
      ],
      "xyz": [
        "one",
        "two",
        "four"
      ]
    }