google-cloud-storagestreaminggoogle-cloud-pubsubnear-real-time

Near real time streaming data from 100s customer to Google Pub/Sub to GCS


I am getting near-real time data from 100s of customers. I need to store this data in Google Cloud Storage buckets created for each customer i.e. /gcs/customer_id/yy/mm/day/hhhh/

My data is in Avro. I guess I can use Pub/Sub to Avro Files on Cloud Storage template. However, I'm not sure if Google Pub/Sub can accept data from multiple customers. Appreciate any help here, thanks!


Solution

  • The template is quite simple: it takes all the data of PubSub and store them in an avro file on GCS.

    However, it's a good starting point and you can make evolutions on that base to add a split per customer, and the file path that you want.

    You can find the template in Java format on GitHub