I would like to know if it's possible to write compressed files using the fileio module from Apache Beam, Python SDK. At the moment I am using the module to write files to a GCP bucket:
_ = (logs | 'Window' >> beam.WindowInto(window.FixedWindows(60*60))
| 'Convert to JSON' >> beam.ParDo(ConvertToJson())
| 'Write logs to GCS file' >> fileio.WriteToFiles(path = gsc_output_path, shards=1, max_writers_per_bundle=0))
Compression would help in minimizing storage costs.
According to this doc and comment inside class _MoveTempFilesIntoFinalDestinationFn, developers still need to implement handling of compression.
Am I right about this or does someone know how to do it?
Thank you!
developers still need to implement handling of compression.
This is correct.
Though there are open FRs:
At the moment, you can write a DoFn: read the final files -> compress -> write the compressed final files and delete original final files.