clojureringtransducercheshire

How to adapt the IReduceInit from next.jdbc to stream JSON using cheshire to a HTTP response using ring


tl;dr how to turn an IReduceInit into a lazy-seq of transformed values

I have a database query which yields a reasonably large dataset for live pivoting on the client (million or two rows, 25 attributes - no problem for a modern laptop).

My (simplified) stack was to call clojure.jdbc to get a (what I thought was lazy) sequence of result lines. I could just serialise that by passing it out as the body through ring-json middleware. There was an issue with ring-json building up the response string on heap, but that has an option as of 0.5.0 to stream the response out.

I discovered through profiling a couple of failure cases that actually clojure.jdbc is realising the whole result set in memory before handing it back. No problem! Rather than work with reducible-query in that library, I decided to move to the new next.jdbc.

The key operation in next.jdbc is plan which returns an IReduceInit, which I can use to run a query and get a resultset...

(into [] (map :cc_id) (jdbc/plan ds ["select cc_id from organisation where cc_id = '675192'"]))
["675192"]

However this realises the whole result set, and in the above case would give me all the ids upfront and in memory. Not an issue for one, but I usually have many.

The plan IReduceInit is a thing I can reduce if I give a starting value, so I could do the output in the reducing function... (thx @amalloy)

(reduce #(println (:cc_id %2)) [] (jdbc/plan ds ["select cc_id from organisation where cc_id = '675192'"]))
675192
nil

...but ideally I'd like to turn this IReduceInit into a lazy sequence of the values after applying a transform function to them, so I can use them with ring-json and cheshire. I don't see any obvious way of doing that.


Solution

  • There are quite a few reasons why my lazy-seq was a bad idea - even if I guarantee not to hold the head, exceptional issues during result streaming will no doubt leave the ResultSet lying around - the serialisation would happen away from the call stack that could clean up.

    The need for laziness is driven by the desire not to realise the whole result in memory, the need for a seq or other coll? is so that the middleware will serialise it...

    Therefore, make the IReduceInit JSONable directly, and then bypass the middleware. If there's an exception during the serialisation the control will pass through the IReduceInit from next.jdbc which can then clean up meaningfully.

    ;; reuse this body generator from my patch to ring.middleware.json directly, as the coll? check will fail
    (defrecord JsonStreamingResponseBody [body options]
      ring-protocols/StreamableResponseBody
      (write-body-to-stream [_ _ output-stream]
        (json/generate-stream body (io/writer output-stream) options)))
     
    ;; the year long yak is shaved in 8 lines by providing a custom serialiser for IReduceInits…
    (extend-type IReduceInit
      cheshire.generate/JSONable
      (to-json [^IReduceInit results ^JsonGenerator jg]
        (.writeStartArray jg)
        (let [rf (fn [_ ^IPersistentMap m]
                   (cheshire.generate/encode-map m jg))]
          (reduce rf nil results))
        (.writeEndArray jg)))
    
    ;; at this point I can wrap the result from next.jdbc/plan with ->JsonStreamingResponseBody into the :body of the ring response and it will stream
    

    It still feels like a lot of work to compose these features, adapter code always makes me worry that I'm missing a simple, idiomatic approach.