google-cloud-platformgoogle-cloud-storagegoogle-cloud-pubsubgoogle-cloud-runcloud-storage

Process 10req/s and save to cloud storage - recommended method?


I have 10 requests per second of data I want to save that looks like the entry below. I need to save this data after a CloudRun function completes. (My infrastructure is on google-cloud-platform). The data will be used as a data set for machine learning.

{ 
  "text": "1k characters", 
  "text2": "1k characters", 
  "metadata1": "enum (100 vals)", 
  "metadata2": "number value" 
}

I planned to save this as a non-awaited function to google-cloud-storage either in one folder or in folders based on the metadata1 enum. Is either better than the other?

Is this the appropriate route to take?

I think pubsub is overkill as suggested in this SO answer.


Solution

  • I can propose you 2 patterns, but in both case you need to store the messages: