I want to count duplicate values with JSONiq. I have following code:
jsoniq version "1.0";
import module namespace fetch = "http://zorba.io/modules/fetch";
let $tweets := parse-json(fetch:content("/tweets.json"))
let $users := parse-json(fetch:content("/users.json"))
return
let $different_languages :=
for $tweet in $tweets[]
return {
"name" : $tweet."metadata"."iso_language_code"
}
return [$different_languages]
This returns all the languages, but it opens a new pair for every language. It looks like this:
[ { "name" : "de" },
{ "name" : "da" },
{ "name" : "da" },
{ "name" : "da" }]
I want to return a JSON object that looks like this:
[ { "count" : 1, "language" : "de" },
{ "count" : 3, "language" : "da" }]
How can i achieve this?
This can be achieved with a group-by clause. This works like a SQL group by, but with an even finer level of control.
In the code below, the four objects in the unboxed $tweets
array are grouped according to their language field ($tweet.metadata.iso_language_code
). In each evaluation of the return clause, the grouping variable $language
will contain the name of the language for the current group, and the non-grouping variable $tweet
will contain the sequence of tweets that belong to the group. Calling count()
on this sequence will return 3 and 1 respectively.
jsoniq version "1.0";
import module namespace fetch = "http://zorba.io/modules/fetch";
let $tweets := parse-json(fetch:content("/tweets.json"))
let $users := parse-json(fetch:content("/users.json"))
return
for $tweet in $tweets[]
group by $language := $tweet."metadata"."iso_language_code"
return { language: $language, count: count($tweet) }
Also, the quotes on metadata
and iso_language_code
are not needed. If needed, you can also wrap the result into an array, like so:
jsoniq version "1.0";
import module namespace fetch = "http://zorba.io/modules/fetch";
let $tweets := parse-json(fetch:content("/tweets.json"))
let $users := parse-json(fetch:content("/users.json"))
return [
for $tweet in $tweets[]
group by $language := $tweet.metadata.iso_language_code
return { language: $language, count: count($tweet) }
]
There is no restriction on the expression in the return clause: $language
and $tweet
are variables like any others and they can be used as inputs to any JSONiq expression. For instance, instead of counting the tweets, one can also nest them in the output, as the data model is arborescent:
return { language: $language, count: [ $tweet ] }