apache-beamapache-beam-io

How make BigQueryIO wait for some DoFn input


In ApacheBeam once you have some PCollection input you can do

input.aplly(new ParDo())

however BigQueryIO.read() can be applied only on the Pipeline instance, so my question is how can I make BigQueryIO.read() wait till some other DoFn finishes or produces at least 1 output, should it be a different pipeline where I'll put BigQueryIO or can it be done within the same one?


Solution

  • I don't think it's possible to make BigQueryIO.read() wait for some input since, actually, it creates a PTransform<PBegin, PCollection<T>> where PBegin input type says that it's supposed to be executed in the beginning of your pipeline.

    I also don't see any other "read" PTransform's implemented in BigQueryIO connector that would accept any input PCollection.

    So, very likely it will be easier run it as a different pipeline and use something like Apache Airflow to orchestrate them.