jsonclojureamazon-kinesisamazonica

How should I format data for a Kinesis event when using Amazonica in Clojure?


When I put an event into a stream using the AWS CLI, I can pass JSON in and get it back out, after decoding from base64. When I try to put an event using Amazonica, from Clojure, I am having a hard time formatting the event data parameter correctly though.

(kinesis/put-record "ad-stream" {:ad-id "some-id"} "parition-key"))

creates an event with a base64 encoded data field of "TlBZCAAAABXwBhsAAAACagVhZC1pZGkHc29tZS1pZA==", which decodes to

NP�jad-idisome-id

If I JSON encode the data first:

 (kinesis/put-record "ad-stream" (json/write-str {:ad-id "some-id-2"}) "parition-key")

then I get an event with less junk characters, but it still isn't quite perfect, not good enough to read in other apps without breaking something:

NPi{"ad-id":"some-id-2"}

What is the significance of that leading junk, when converting Clojure maps to JSON? How to I pass a simple object to kinesis?

The tests show a plain map being passed as put-record's data parameter, I don't understand yet why that didn't just work for me:

  (let [data {:name "any data"
              :col  #{"anything" "at" "all"}
              :date now}
        sn (:sequence-number (put-record my-stream data (str (UUID/randomUUID))))]
    (put-record my-stream data (str (UUID/randomUUID)) sn))

  (Thread/sleep 3000)

  (def shard (-> (describe-stream my-stream)
               :stream-description
               :shards
               last
:shard-id))

update

I'm pretty sure that this is a bug in the library (or the serializer that it uses), so I'm continuing the investigation in a bug report at https://github.com/mcohen01/amazonica/issues/211.


Solution

  • Passing a ByteBuffer of a JSON string as the record data works for me.

    (kinesis/put-record "ad-stream"
                        (-> {:ad-id "ad-stream"}
                            json/write-str .getBytes ByteBuffer/wrap)
                        "parition-key")
    

    Record data: "eyJhZC1pZCI6ImFkLXN0cmVhbSJ9", which decodes to:

    {"ad-id":"ad-stream"}
    

    This works around any encoding issue in the library, because Amazonica skips encoding when it is passed a ByteBuffer.