firebasegoogle-cloud-platformapache-beamconnector

Do we have a Firebase I/O connector for Apache Beam?


I tried looking for Firebase I/O connectors for Firebase but wasn't able to find one. Can someone please help me in doing so, or someone has a Firebase I/O connector with them to read and write my files, please help me with it.

Thanks in Advance.


Solution

  • There is the following interesting link from the official Google documentation, showing a read and write example with Beam Java :

    https://cloud.google.com/blog/topics/developers-practitioners/using-firestore-and-apache-beam-data-processing

    Pipeline pipeline = Pipeline.create(options);
    
    String collectionGroupId = "collection-group-name";
    RpcQosOptions rpcQosOptions = RpcQosOptions.newBuilder()
        .withHintMaxNumWorkers(options.as(DataflowPipelineOptions.class)
        .getMaxNumWorkers())
        .build();
    
    pipeline
           .apply(Create.of(collectionGroupId))
           .apply(new CreatePartitionQueryRequest(rpcQosOptions.getHintMaxNumWorkers()))
           .apply(FirestoreIO.v1().read().partitionQuery().withNameOnlyQuery().build())
           .apply(FirestoreIO.v1().read().runQuery().build())
           .apply(MapElements.into(TypeDescriptors.strings()).via(
               (runQueryResponse) -> runQueryResponse.getDocument().getName())
           )
           .apply(ParDo.of(new CreateDeleteOperation()))
           .apply("shuffle writes", Reshuffle.viaRandomKey())
           .apply(
                   FirestoreIO.v1().write()
                           .batchWrite()
                           .withRpcQosOptions(rpcQosOptions)
                           .build()
           );
    
    pipeline.run().waitUntilFinish();
    

    The link to the Javadoc :

    https://beam.apache.org/releases/javadoc/2.41.0/org/apache/beam/sdk/io/gcp/firestore/FirestoreIO.html

    You can also check this link showing an example of write with FirestoreIO :

    Add document to Firestore from Beam with auto generated ID

    For Python, I think there is no open source IO now on Beam, but you can use the Firestore client in a ParDo and DoFn, here a link showing an example :

    Using FireStore in Google Dataflow