haskellc-stringssyslogbytestringhaskell-ffi

Most efficient way of converting a Data.ByteString.Lazy to a CStringLen


I need to encode some data to JSON and then push is to the syslog using hsyslog. The types of the two relevant functions are:

Aeson.encode :: a -> Data.ByteString.Lazy.ByteString

System.Posix.Syslog.syslog :: Maybe Facility
                           -> Priority
                           -> CStringLen
                           -> IO () 

What's the most efficient way (speed & memory) to convert a Lazy.ByteString -> CStringLen? I found Data.ByteString.Unsafe, but it works only with ByteString, not Lazy.ByteString?

Shall I just stick a unsafeUseAsCStringLen . Data.String.Conv.toS and call it a day? Will it to the right thing wrt efficiency?


Solution

  • I guess I would use Data.ByteString.Lazy.toStrict in place of toS, to avoid the additional package dependency.

    Anyway, you won't find anything more efficient than:

    unsafeUseAsCStringLen (toStrict lbs) $ \cstrlen -> ...
    

    In general, toStrict is an "expensive" operation, because a lazy ByteString will generally be made up of a bunch of "chunks" each consisting of a strict ByteString and not necessarily yet loaded into memory. The toStrict function must force all the strict ByteString chunks into memory and ensure that they are copied into a single, contiguous block as required for a strict ByteString before the no-copy unsafeUseAsCStringLen is applied.

    However, toStrict handles a lazy ByteString that consists of a single chunk optimally without any copying.

    In practice, aeson uses an efficient Data.ByteString.Builder to create the JSON, and if the JSON is reasonably small (less than 4k, I think), it will build a single-chunk lazy ByteString. In this case, toStrict is zero-copy, and unsafeUseAsCStringLen is zero copy, and the entire operation is basically free.

    But note that, in your application, where you are passing the string to the syslogger, fretting about the efficiency of this operation is crazy. My guess would be that you'd need thousands of copy operations to even make a dent in the performance of the overall action.