timercancellationazure-durable-functionslong-running-processes

Is it safe to terminate an orchestrator function that is waiting on a timer?


If I call IDurableOrchestrationClient.TerminateAsync on a running orchestrator that is currently sleeping and waiting on a durable timer created through IDurableOrchestrationContext.CreateTimer, will this result in a graceful shutdown of the entire process including the timer?

The documentation on timers has a very explicit warning regarding graceful cancellation of any timers that didn't activate in the orchestration:

Warning

Use a CancellationTokenSource (.NET) or call cancel() on the returned TimerTask (JavaScript) to cancel a durable timer if your code will not wait for it to complete. The Durable Task Framework will not change an orchestration's status to "completed" until all outstanding tasks are completed or canceled.

Will calling Terminate respect these restrictions? The documentation on it doesn't make this clear at all.

Scenario

I have a requirement where I need to "monitor" some instance for some time after it is created, and to stop monitoring it once the instance is deleted.

For now, my application raises EventGrid events on creation and deletion of entities, and I listen to those in my orchestration trigger activity: once a creation event is received, I start the orchestration that monitors that instance. Once the deletion event is received, I send an external event to the orchestrator to signal that it should stop monitoring that instance and terminate. Each creation/deletion pair is handled by its own monitor of course (they are isolated from one another, controlled by an explicit instance ID based on the entity ID).

The orchestrator logic basically fires 2 tasks and waits for the first to finish:

  1. A timer, based on data in the created instance. This can be minutes, to months
  2. A WaitForExternalEvent that waits until the trigger function notifies of a cancellation

If task 2 wins, I cancel the timer (as per the recommendation), and exit the function. If task 1 wins, I do the processing I need on the instance and ignore the external event.

The orchestrator itself looks something like this:

    [FunctionName(nameof(StartExpirationTracking))]
    public async Task StartExpirationTracking(
        [OrchestrationTrigger] IDurableOrchestrationContext context)
    {
        var entity = context.GetInput<Entity>();

        using var cancellationTokenSource = new CancellationTokenSource();
        var expirationTimeTask = context.CreateTimer(entity.ExpirationDate, cancellationTokenSource.Token);
        var stopTrackingEventTask = context.WaitForExternalEvent("StopTracking");

        var winner = await Task.WhenAny(expirationTimeTask, stopTrackingEventTask);
        if (winner == stopTrackingEventTask)
        {
            cancellationTokenSource.Cancel();

            return;
        }

        await ProcessExpiration(entity);
    }

The question then becomes: instead of sending an external event and waiting for that to terminate the orchestration, would it be safe to just call Terminate directly on it from the trigger?

This would simplify my logic a bit since I wouldn't need to care about any external events in the orchestration and could just do this:

    [FunctionName(nameof(StartExpirationTracking))]
    public async Task StartExpirationTracking(
        [OrchestrationTrigger] IDurableOrchestrationContext context)
    {
        var entity = context.GetInput<Entity>();

        await context.CreateTimer(entity.ExpirationDate, CancellationToken.None);
        await ProcessExpiration(entity);
    }

but I'm not sure if terminating the execution is safe when dealing with durable timers (I couldn't find any info on that). Would terminating the instance also terminate the timers in a graceful manner? Would this approach change function usage/costs in any way? Or should I keep my ExternalEvent approach to explicitly cancel the execution gracefully?

This is an updated cross-post from https://github.com/Azure/azure-functions-durable-extension/discussions/2115


Solution

  • Yes, it's safe to terminate an orchestration, even if it has pending timers outstanding (or pending activity or sub-orchestration calls, for that matter). The orchestration will always transition into the "Terminated" state. Any messages that appear for an instance after it has been terminated (including timer messages) will be silently discarded.