crecursionsize-t

How does the recursion stack in my C program lead to an unexpected sum?


I'm working on a C program where I use recursion to sum even numbers from a sequence of user inputs. However, when I input -1 (knowing that size_t cannot store negative numbers), I end up with a large, unexpected sum in the final output.

I understand that using the wrong format specifier leads to undefined behavior (UB), but in this case, it doesn't seem to affect the base condition. The base condition appears to be working correctly, but the final sum is still incorrect.

#include <stdio.h>

size_t inpNum;

size_t sumEven()
{
    printf("Enter the seq:");
    scanf("%zu", &inpNum);

    if (inpNum == -1) {
        printf("Base condition: %zu\n", inpNum);
        return 0;
    } else if (inpNum % 2 == 0) {
        printf("EvenNumber: %zu\n", inpNum);
        return inpNum + sumEven();
    } else {
        return sumEven();
    }
}

int main()
{
    printf("The sum of all even numbers: %zu\n", sumEven());
}

Output

Enter the seq:4
EvenNumber:4
Enter the seq:2
EvenNumber:2
Enter the seq:1
Enter the seq:-1
Base condition:18446744073709551615
The sum of all even numbers:18446744073709551614

For Example:

sumEven(-1) --> returns 0 
sumEven(1)  --> waiting for sum from next call
inpNum + sumEven(2)  --> waiting for sum from next call
inpNum + sumEven(4)  --> waiting for sum from next call
sumEven() --> First call

0 to sumEven(1) it will return 0 to sumEven(2) it will return 2 + 0 to sumEven(4) to 4 + 2 will return to first call but it doesn't happen why?

Question:

Given that size_t can't store -1 and it’s interpreted as 18446744073709551615, how does the recursion stack process the calls to result in the final sum of 18446744073709551614? Specifically, how is the sum derived through the sequence of operations in the stack?

Context: I know the base condition is triggered due to implicit type casting by the compiler, but I'm struggling to understand how the sum is calculated as each recursive call returns. Why does the final sum not match the expected result from the recursive additions?


Solution

  • TLDR: The reason is likely that the evaluation order of the operands of the + operator is unspecified in all C standards, and the minimal fix is to use a local variable instead of a global one and also fix the format specifier.

    We cannot be sure about what happens unless we examine the assembly once there is UB in the code, but since you insist, I will try to explain what the compiler is likely doing in this case. Note that there is no guarantee that this is what actually happens.

    As a simple example, when you input 2 and -1 sequentially, what you believe is happening is that,

    1. sumEven() is called (top level)
    2. scanf() is called, changing inpNum to 2.
    3. the else if branch is entered, and the evaluation of inpNum+sumEven() is requested
    4. inpNum is evaluated, which is 2.
    5. sumEven() is called (nested level)
    6. scanf() is called, this is UB but in this particular case inpNum is changed to SIZE_MAX.
    7. The if branch is entered and the sumEven() call from 5. is returned
    8. sumEven() evaluates to 0
    9. inpNum+sumEven() is evaluated as 2+0, which gives 2.
    10. The sumEven() call from step 1 is returned, and the return value is 2 as expected.

    What is likely actually happening, is this instead,

    1. sumEven() is called (top level)
    2. scanf() is called, changing inpNum to 2.
    3. the else if branch is entered, and the evaluation of inpNum+sumEven() is requested. The compiler chooses to evaluate sumEven() before inpNum, which is allowed by the language. If you are too used to left-to-right evaluation order and it helps, you may conceptually think the compiler can evaluate it as either sumEven()+inpNum or inpNum+sumEven().
    4. sumEven() is called (nested level)
    5. scanf() is called, this is UB but in this particular case inpNum is changed to SIZE_MAX.
    6. The if branch is entered and the sumEven() call from 4. is returned
    7. sumEven() evaluates to 0
    8. inpNum is evaluated, which is now SIZE_MAX due to step 5.
    9. inpNum+sumEven() (or it helps conceptually, sumEven()+inpNum) is evaluated as SIZE_MAX+0, which gives SIZE_MAX.
    10. The sumEven() call from step 1 is returned, and the return value is SIZE_MAX and not the expected 2.

    As already commented, you have run into this issue due to incorrect and unnecessary use of global variables. Once you change it to a local variable, the incorrect coupling between different recursions is eliminated, and you can get the expected result. Still, you should fix the issue with the UB due to the incorrect format specifier.

    As is commented. You should use scanf() with care. The bare minimum for correct use of scanf() is to check the return value to guard against the case when the input is not an integer.