A lambda continuation with a generic type does not produce a stack overflow, but when I specify the type it results in a stack overflow. Why?

In a program I'm writing, I'm faced with a recursive type that I want to go through recursively (it seems necessary). Out of curiosity, I wanted to try and write a tail recursive version of my code, so as to avoid stack overflows when the data to be traversed is deeply recursive (I'll skip the details). I arrived at a version where, to achieve my goal, I had to implement continuations so that in the end my code is tail recursive or "continuation recursive" (i.e. the continuation is the last operation of an execution branch). I don't know how F# / .NET would optimize this style of code (i.e. recursive continuation), that's why I tried.

I then stumbled for a long time over a particular behavior that I've only just put my finger on. My continuation is of the following type: cont: option<int> -> 'a, but I know that in the end the generic type 'a is simply a unit. By replacing this deduced type myself, the runtime behavior changes:

When I specify the type, cont: option<int> -> unit, the code overflows the stack.
When I do not specify the type and leave it generic, cont: option<int> -> 'a, the code doesn't overflow and terminates.

I don't understand what produces such behavior. I tried to produce a minimal version of my real code and came up with a code that is quite different but shows the same behavior:

type Expr = 
    | Number of int
    | Add of Expr * Expr  
    | Multiply of Expr * Expr

module Evaluator =
    let rec private evalCPS expr (cont: int option -> 'a) =   // I speak about this continuation
        match expr with
        | Number n -> 
            cont (Some n)
        | Add(left, right) ->
            let continuation leftResult =
                match leftResult with
                | Some leftVal ->
                    evalCPS right (fun rightResult ->
                        match rightResult with
                        | Some rightVal -> cont (Some(leftVal + rightVal))
                        | None -> cont None)
                | None -> cont None
            
            evalCPS left continuation
            
        | Multiply(left, right) ->
            let continuation leftResult =
                match leftResult with
                | Some leftVal ->
                    evalCPS right (fun rightResult ->
                        match rightResult with
                        | Some rightVal -> cont (Some(leftVal * rightVal))
                        | None -> cont None)
                | None -> cont None
                
            evalCPS left continuation

    let evaluate expr =
        let mutable result = None
        evalCPS expr (fun r -> result <- r)
        result

// Generates a structure of specified complexity for testing purpose.
let generateExpr n =
    let random = Random()
    
    let rec generate size =
        if size <= 1 then 
            Number(random.Next(1, 10))
        else
            let leftSize = random.Next(1, size)
            let rightSize = size - leftSize
            
            match random.Next(2) with
            | 0 -> Add(generate leftSize, generate rightSize)
            | _ -> Multiply(generate leftSize, generate rightSize)
            
    generate n


let expr = generateExpr 50_000 // big depth for the test
let result = Evaluator.evaluate expr
printfn "%d" result

I'm not really interested in whether or not this code is idiomatic or improvable since it's not my real code. I'm only interested in why specifying the actual type rather than leaving it generic in a higher-order function produces a stack overflow when the version with generic doesn't. So that's my question: why? How ? And what could we learn from this from the user's point of view?

I'm not sure if this is F#-specific behavior because I know that generics still live in .NET bytecode, and I wouldn't be surprised if the reason for this behavior originated there, but I don't know C# so I can't write equivalent code.

I use dotnet 9.0.102 on Linux Manjaro and run the code with dotnet run --configuration Release. In the case where the code is compiled in debug rather than in release, in both cases the code produces an overflow.

Solution

If you compare the two snippets in how they are compiled (generic, unit), you can see one crucial difference in the way evalCPS is invoked.

Without unit:

public override a Invoke(FSharpOption<int> leftResult)
{
    if (leftResult != null)
    {
        int value = leftResult.Value;
        return evalCPS(right, new evalCPS@17-1<a>(cont, value));
    }
    return cont.Invoke(null);
}

With:

public override Unit Invoke(FSharpOption<int> leftResult)
{
    if (leftResult != null)
    {
        int value = leftResult.Value;
        evalCPS(right, new evalCPS@17-1(cont, value));
        return null;
    }
    return cont.Invoke(null);
}

This corresponds to your Add and Multiply handlers, but notice that one returns the result of calling evalCPS, while the other just returns null (the sole value of the Unit type). This could be a tail call candidate, and indeed, the CIL reveals that the generic version has a tail call:

IL_0005: ldarg.1
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: call instance !0 class [FSharp.Core]Microsoft.FSharp.Core.FSharpOption`1<int32>::get_Value()
IL_000d: stloc.1
IL_000e: ldarg.0
IL_000f: ldfld class _/Expr class _/Evaluator/evalCPS@15<!a>::right
IL_0014: ldarg.0
IL_0015: ldfld class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.FSharpOption`1<int32>, !0> class _/Evaluator/evalCPS@15<!a>::cont
IL_001a: ldloc.1
IL_001b: newobj instance void class _/Evaluator/'evalCPS@17-1'<!a>::.ctor(class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.FSharpOption`1<int32>, !0>, int32)
IL_0020: tail. // here
IL_0022: call !!0 _/Evaluator::evalCPS<!a>(class _/Expr, class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.FSharpOption`1<int32>, !!0>)
IL_0027: ret

This explains why the unit version takes up all the stack space, because there there is no tail call and so the recursion causes the stack to grow over time.

In this case, the tail call optimization is actually impossible, since evalCPS is compiled as a void-returning method, while the continuation needs to return Unit to be usable as an FSharpFunc.

Returning any other type than the duplicitous unit should solve this, because then there is no need to convert it back and forth from void. However, I believe F# mistakenly treats this function as tail-recursive, so I have also filed a bug report to see if this behaviour can be improved.