google-cloud-platformapache-flinkgoogle-cloud-dataprocgoogle-cloud-bigtable

Flink-BigTable - Any connector?


I would like to use BigTable as a sink for a Flink job:

  1. Is there a connector out-of-the-box ?
  2. Can I use Datastream API ?
  3. How can I optimally pass a sparse object (99% sparsity), i.e. ensure no key/value are created in BigTable for nulls ?

I have searched the documentation for the above topics but couldn't answer those questions.

Thanks for your support !


Solution

  • I do not think that Flink has a native BigTable connector.

    That said, you can use Flink HBase SQL Connector with BigTable HBase client to access BigTable from Flink:

    Flink job <-> Flink HBase SQL Connector <-> BigTable HBase client <-> BigTable
    

    This connector appears to be similar as the Flink HBase connector proposed by Cloudera and that can be manually installed (see comment @rsantiago).

    A possible approach regarding sparse data persistence could be taken from Cloudera's example where columns are added with put.addColumn so that in you could evaluate in that section if it is null and discard it (see comment @rsantiago).