I would like to know if it's possible to scale the number workers depending on the number of unacknowledged pubsub messages.
I have an API service that publishes messages to a queue and then the workers processes these messages. These process jobs can take anywhere between 1 to 5 minutes and I want to be able to scale up the number of instances as my application grows. I also want to be able to scale down to 0 to save costs if there are no messages running for N number of minutes since these instances I want to spin up will be high CPU services that are expensive to keep alive.
One solution I have is running a small cloud function instance that is constantly polling the pubsub queue to see how many un'acked messages there are and then manually spinning up instances as needed.
Is this solution the idiomatic way when using GCP Flex? I know there is a way to do it with compute : https://cloud.google.com/compute/docs/autoscaler/scaling-cloud-monitoring-metrics#scale_based_on_pubsub
Any real world or practice advice appreciated.
Thanks
Short answer is no: App Engine flex doesn't scale to 0, so your design can meet one of your expectation.
A better design is to use Cloud Run:
That should be OK for the foundation. You could have additional constraints (network, number of CPUs,...), let us know if you want to refine that design.