I am streaming the download of an S3 file using amazonka, and I use the sinkBody
function to continue with the streaming. Currently, I download the file as follows:
getFile bucketName fileName = do
resp <- send (getObject (BucketName bucketName) fileName)
sinkBody (resp ^. gorsBody) sinkLazy
where sinkBody :: MonadIO m => RsBody -> ConduitM ByteString Void (ResourceT IO) a -> m a
. In order to run in constant memory, I thought that sinkLazy
is a good option for getting a value out of the conduit stream.
After this, I would like to save the lazy bytestring of data (S3 file) into a local file, for which I use this code:
-- fetch stream of data from S3
bytestream <- liftIO $ AWS.runResourceT $ runAwsT awsEnv $ getFile serviceBucket key
-- create a file
liftIO $ writeFile filePath ""
-- write content of stream into the file (strict version), keeps data in memory...
liftIO $ runConduitRes $ yield bytestream .| mapC B.toStrict .| sinkFile filePath
But this code has the flaw that I need to "realise" all the lazy bytestring in memory, which means that it cannot run in constant space.
Is there any way that I can use conduit to yield
a lazy bytestring and save it into a file in constant memory?
or, any other approach that does not use the sinkLazy
and solves the problem of saving into a file running in constant space?
EDIT
I also tested writing the lazy bytestream directly to a file, as follows, but this consumes about 2 times the file size in memory. (The writeFile
is from Data.ByteString.Lazy
).
bytestream <- liftIO $ AWS.runResourceT $ runAwsT awsEnv $ getFile serviceBucket key
writeFile filename bytestream
Well, the purpose of a streaming library like conduit
is to realize some of the benefits of lazy data structures and actions (lazy ByteString
s, lazy I/O, etc.) while better controlling memory usage. The purpose of the sinkLazy
function is to take data out of the conduit
ecosystem with its well controlled memory footprint and back into the wild West of lazy objects with associated space leaks. So, that's your problem right there.
Rather than sink the stream out of conduit
and into a lazy ByteString
, you probably want to keep the data in conduit
and sink the stream directly into the file, using something like sinkFile
. I don't have an AWS test program up and running, but the following type checks and probably does what you want:
import Conduit
import Control.Lens
import Network.AWS
import Network.AWS.S3
getFile bucketName fileName outputFileName = do
resp <- send (getObject (BucketName bucketName) fileName)
sinkBody (resp ^. gorsBody) (sinkFile outputFileName)