apache-beamapache-beam-io

How to run BigQueryIO.read().fromQuery with parameters


I need to run multiple queries from a single .SQL file but with different params I've tried something like this but it does not work as BigQueryIO.Read consumes only PBegin.

  public PCollection<KV<String, TestDitoDto>> expand(PCollection<QueryParamsBatch> input) {

    PCollection<KV<String, Section1Dto>> section1 = input.apply("Read Section1 from BQ",
                BigQueryIO
                        .readTableRows()
                        .fromQuery(ResourceRetriever.getResourceFile("query/test/section1.sql"))
                        .usingStandardSql()
                        .withoutValidation())
            .apply("Convert section1 to Dto", ParDo.of(new TableRowToSection1DtoFunction()));
  }

Are there any other ways to put params from existing PCollection inside my BigQueryIO.read() invocation?


Solution

  • I've come up with the following solution: not to use BigQueryIO but regular GCP library for accessing BigQuery, marking it as transient and initializing it each time in method with @Setup annotation, as it is not Serializable

    public class DenormalizedCase1Fn extends DoFn<*> {
    
      private transient BigQuery bigQuery;
    
      @Setup
      public void initialize() {
        this.bigQuery = BigQueryOptions.newBuilder()
                .setProjectId(bqProjectId.get())
                .setLocation(LOCATION)
    
                .setRetrySettings(RetrySettings.newBuilder()
                        .setRpcTimeoutMultiplier(1.5)
                        .setInitialRpcTimeout(Duration.ofSeconds(5))
                        .setMaxRpcTimeout(Duration.ofSeconds(30))
                        .setMaxAttempts(3).build())
                .build().getService();
      }
    
      @ProcessElement 
      ...