arraysjsonshelljqmemory-efficient

How to split an array into chunks with jq?


I have a very large JSON file containing an array. Is it possible to use jq to split this array into several smaller arrays of a fixed size? Suppose my input was this: [1,2,3,4,5,6,7,8,9,10], and I wanted to split it into 3 element long chunks. The desired output from jq would be:

[1,2,3]
[4,5,6]
[7,8,9]
[10]

In reality, my input array has nearly three million elements, all UUIDs.


Solution

  • The following stream-oriented definition of window/3, due to Cédric Connes (github:connesc), generalizes _nwise, and illustrates a "boxing technique" that circumvents the need to use an end-of-stream marker, and can therefore be used if the stream contains the non-JSON value nan. A definition of _nwise/1 in terms of window/3 is also included.

    The first argument of window/3 is interpreted as a stream. $size is the window size and $step specifies the number of values to be skipped. For example,

    window(1,2,3; 2; 1)
    

    yields:

    [1,2]
    [2,3]
    

    window/3 and _nsize/1

    def window(values; $size; $step):
      def checkparam(name; value): if (value | isnormal) and value > 0 and (value | floor) == value then . else error("window \(name) must be a positive integer") end;
      checkparam("size"; $size)
    | checkparam("step"; $step)
      # We need to detect the end of the loop in order to produce the terminal partial group (if any).
      # For that purpose, we introduce an artificial null sentinel, and wrap the input values into singleton arrays in order to distinguish them.
    | foreach ((values | [.]), null) as $item (
        {index: -1, items: [], ready: false};
        (.index + 1) as $index
        # Extract items that must be reused from the previous iteration
        | if (.ready | not) then .items
          elif $step >= $size or $item == null then []
          else .items[-($size - $step):]
          end
        # Append the current item unless it must be skipped
        | if ($index % $step) < $size then . + $item
          else .
          end
        | {$index, items: ., ready: (length == $size or ($item == null and length > 0))};
        if .ready then .items else empty end
      );
    
    def _nwise($n): window(.[]; $n; $n);
    

    Source:

    https://gist.github.com/connesc/d6b87cbacae13d4fd58763724049da58