ubuntugoogle-cloud-dataflowapache-beamosmium

Configure Linux Distro on Google Dataflow Workers?


My Beam/Dataflow pipeline involves some data processing which uses a Linux CLI tool (specifically, osmium-tool version >= 1.9.1) which is only available for some newer Ubuntu distros (cosmic or disco). I cannot find any documentation on the OS specs of Dataflow Workers, or if it is configurable.

Any help would be appreciated, thanks!


Solution

  • One of the goals of the Beam portability effort is to allow customizations such as this. Though it's still experimental at this point, you can follow the instructions at https://beam.apache.org/documentation/runtime/environments/ to completely configure the environment your user code runs in.