arraysjsonintersectionjq

how to get the intersection of two JSON arrays using jq


Given arrays X and Y (preferably both as inputs, but otherwise, with one as input and the other hardcoded), how can I use jq to output the array containing all elements common to both? e.g. what is a value of f such that

echo '[1,2,3,4]' | jq 'f([2,4,6,8,10])'

would output

[2,4]

?

I've tried the following:

map(select(in([2,4,6,8,10])))  --> outputs [1,2,3,4]
select(map(in([2,4,6,8,10])))  --> outputs [1,2,3,4,5]

Solution

  • A simple and quite fast (but somewhat naive) filter that probably does essentially what you want can be defined as follows:

       # x and y are arrays
       def intersection(x;y):
         ( (x|unique) + (y|unique) | sort) as $sorted
         | reduce range(1; $sorted|length) as $i
             ([]; if $sorted[$i] == $sorted[$i-1] then . + [$sorted[$i]] else . end) ;
    

    If x is provided as input on STDIN, and y is provided in some other way (e.g. def y: ...), then you could use this as: intersection(.;y)

    Other ways to provide two distinct arrays as input include:

    Here's a simpler but slower def that's nevertheless quite fast in practice:

        def i(x;y):
           if (y|length) == 0 then []
           else (x|unique) as $x
           | $x - ($x - y)
           end ;
    

    Here's a standalone filter for finding the intersection of arbitrarily many arrays:

    # Input: an array of arrays
    def intersection:
      def i(y): ((unique + (y|unique)) | sort) as $sorted
      | reduce range(1; $sorted|length) as $i
           ([]; if $sorted[$i] == $sorted[$i-1] then . + [$sorted[$i]] else . end) ;
      reduce .[1:][] as $a (.[0]; i($a)) ;
    

    Examples:

    [ [1,2,4], [2,4,5], [4,5,6]] #=> [4]
    [[]]                         #=> []
    []                           #=> null
    

    Of course if x and y are already known to be sorted and/or unique, more efficient solutions are possible. See in particular Finite Sets of JSON Entities