google-app-engine

Configuring a task queue and instance for non-urgent work


I am using an F4 instance (because of memory needs) with automatic scheduling to do some background processing. It is run from a task queue. It takes 40s to 60s to complete each invocation. Because of the high memory needs, each instance should only handle one request at a time.

The action that needs to be done is not urgent. If it doesn't get scheduled for 30 minutes that isn't a problem. Even 60 minutes is acceptable and I'd rather make use of that time rather than spin up more instances. However, if the service gets popular and the is getting more than 60 requests an hour I want to spin up more instances to make sure there isn't more than a 60 minute wait.

I am having trouble figuring out how to configure the instance and queue parameters to keep my costs down but be able to scale in that way. My initial thought was something like this:

<queue>
    <name>non-urgent-queue</name>
    <target>slow-service</target>
    <rate>1/m</rate>
    <bucket-size>1</bucket-size>
    <max-concurrent-requests>1</max-concurrent-requests>
</queue>


<automatic-scaling>
    <min-idle-instances>0</min-idle-instances>
    <max-idle-instances>0</max-idle-instances>
    <min-pending-latency>20m</min-pending-latency>
    <max-pending-latency>1h</max-pending-latency>
    <max-concurrent-requests>1</max-concurrent-requests>
</automatic-scaling>

First of all those latency settings are invalid, but I can't find documentation on the valid range or units. Where can I find that info?

Secondly, if I understand the queue settings correctly, this configuration would limit it to 60 invocations an hour getting to the service, even if the task queue had 60+ jobs waiting.


Solution

  • This is how I ended up doing it. I use a slow queue and a fast queue configured like this:

    <queue>
        <name>slow-queue</name>
        <target>pdf-service</target>
        <rate>2/m</rate>
        <bucket-size>1</bucket-size>
        <max-concurrent-requests>1</max-concurrent-requests>
      </queue>
    <queue>
        <name>fast-queue</name>
        <target>pdf-service</target>
        <rate>10/m</rate>
        <bucket-size>1</bucket-size>
        <max-concurrent-requests>5</max-concurrent-requests>
    </queue>
    

    The max-concurrent-requests in the slow queue ensures only one task will run at a time, so there will only be one instance active.

    Before I post to the slow queue I check to see how many items are already on the queue. The result may not be totally reliable, but for my purposes it is sufficient. In java:

    QueueStatistics queueStats = queue.fetchStatistics();
    if(queueStats.getNumTasks()<30) { 
       //post to slow queue
    } else {
       //post to fast queue
    }
    

    So when my slow queue gets too full, I post to the fast queue which allows concurrent requests.

    The instance is configured like this:

    <automatic-scaling>
       <min-idle-instances>0</min-idle-instances>
       <max-idle-instances>automatic</max-idle-instances>
       <min-pending-latency>15s</min-pending-latency>
       <max-pending-latency>15s</max-pending-latency>
       <max-concurrent-requests>1</max-concurrent-requests>
    </automatic-scaling>
    

    So it will create new instances as slowly as possible (15s is the max latency) and make sure only one process runs on an instance at a time.

    With this configuration I'll have a max of 6 instances at a time but that should do about 500/hr. I could increase the rate and concurrent requests to do more.

    The negative of this solution is an element of unfairness. Under heavy load, some tasks will be stuck in the slow queue while others will get processed more quickly in the fast queue.

    Because of that, I have decreased the max items on the slow queue to 13 so the unfairness won't be so extreme, maybe a 10 minute wait for jobs that go to the slow queue when it is full.