jsonjsoniq

JSONiq count duplicates in array


I want to count duplicate values with JSONiq. I have following code:

jsoniq version "1.0";

import module namespace fetch = "http://zorba.io/modules/fetch";

let $tweets := parse-json(fetch:content("/tweets.json"))
let $users := parse-json(fetch:content("/users.json"))

return 
    let $different_languages :=
        for $tweet in $tweets[]
        return {
            "name" : $tweet."metadata"."iso_language_code" 
        }


    return [$different_languages]

This returns all the languages, but it opens a new pair for every language. It looks like this:

    [ { "name" : "de" }, 
      { "name" : "da" },
      { "name" : "da" },
      { "name" : "da" }]

I want to return a JSON object that looks like this:

    [ { "count" : 1, "language" : "de" }, 
      { "count" : 3, "language" : "da" }]

How can i achieve this?


Solution

  • This can be achieved with a group-by clause. This works like a SQL group by, but with an even finer level of control.

    In the code below, the four objects in the unboxed $tweets array are grouped according to their language field ($tweet.metadata.iso_language_code). In each evaluation of the return clause, the grouping variable $language will contain the name of the language for the current group, and the non-grouping variable $tweet will contain the sequence of tweets that belong to the group. Calling count() on this sequence will return 3 and 1 respectively.

    jsoniq version "1.0";
    
    import module namespace fetch = "http://zorba.io/modules/fetch";
    
    let $tweets := parse-json(fetch:content("/tweets.json"))
    let $users := parse-json(fetch:content("/users.json"))
    
    return 
      for $tweet in $tweets[]
      group by $language := $tweet."metadata"."iso_language_code" 
      return { language: $language, count: count($tweet) }
    

    Also, the quotes on metadata and iso_language_code are not needed. If needed, you can also wrap the result into an array, like so:

    jsoniq version "1.0";
    
    import module namespace fetch = "http://zorba.io/modules/fetch";
    
    let $tweets := parse-json(fetch:content("/tweets.json"))
    let $users := parse-json(fetch:content("/users.json"))
    
    return [
      for $tweet in $tweets[]
      group by $language := $tweet.metadata.iso_language_code
      return { language: $language, count: count($tweet) }
    ]
    

    There is no restriction on the expression in the return clause: $language and $tweet are variables like any others and they can be used as inputs to any JSONiq expression. For instance, instead of counting the tweets, one can also nest them in the output, as the data model is arborescent:

    return { language: $language, count: [ $tweet ] }