I have written a piece of code that I am using to research the behavior of different libraries and functions. And doing so, I stumbled upon some strange behavior with sscanf.
I have a piece of code that reads an input into a buffer, then tries to put that value into a numeric variable.
When I call sscanf from main using the input buffer, and the format specifier %x yields a garbage value if the input string is shorter than the buffer. Let's say I enter 0xff, I get an arbitrarily large random number every time. But when I pass that buffer to a function, all calls to scanf result in 255 (0xff) like I expect, regardless of type and format specifier mismatch.
My question is, why does this happen in the function main but not in the function test?
This is the code:
#include <stdio.h>
int test(char *buf){
unsigned short num;
unsigned int num2;
unsigned long long num3;
sscanf(buf, "%x", &num);
sscanf(buf, "%x", &num2);
sscanf(buf, "%x", &num3);
printf("%x", num);
printf("%x", num2);
printf("%x", num3);
return 0;
}
void main(){
char buf[16];
unsigned long long num;
printf("%s","Please enter the magic number:");
fgets(buf, sizeof(buf),stdin);
sscanf(buf, "%x", &num);
printf("%x\n", num);
test(&buf);
}
I expect the behavior to be cohesive; all calls should fail, or all calls should succeed, but this is not the case.
I have tried to read the documentation and do experiments with different types, format specifiers, and so on. This behavior is present across all numeric types.
I have tried compiling on different platforms; gcc and Linux behave the same, as do Windows and msvc.
I also disassembled the binary to see if the call to sscanf differs between main() and test(), but that assembly is identical. It loads the pointer to the buffer into a register and pushes that register onto the stack, and calls sscanf.
Now just to be clear: This happens consistently, and num in main is never equal to num, num2 or num3 in test, but num, num2 and num3 are always equal to each other. I would expect this to cause undefined behavior and not be consistent. Output when run - every time
./main
Please enter the magic number: 0xff
0xaf23af23423 <--- different every time
0xff <--- never different
0xff <--- never different
0xff <--- never different
The current reasoning I have is in one instance sscanf is interpreting more bytes than in the other. It seems to keep evaluating the entire buffer, getting impacted by residual data in memory.
I know I can make it behave correctly by either filling the buffer, with that last byte being a new line or using the correct format specifier to match the pointer type. "%llx" for main in this case. So that is not what I am wondering; I have made that error on purpose.
I am wondering why using the wrong format specifier works in one case but not in the other consistently when the code runs.
sscanf
with %x
should be used only with the address of an unsigned int
. When an address of another object is passed, the behavior is not defined by the C standard.
With a pointer to a wider object, the additional bytes in the object may hold other values (possibly leftover from when the startup code prepared the process and called main
). With a pointer to a narrower object, sscanf
may write bytes outside of the object. With compiler optimization, a variety of additional behaviors are possible. These various possibilities may manifest as large numbers, corruption in data, program crashes, or other behaviors.
Additionally, printing with incorrect conversion specifiers is not defined by the C standard, and can cause errors in printf
attempting to process the arguments passed to it.
Use %hx
to scan into an unsigned short
. Use %lx
to scan into an unsigned long
. Use %llx
to scan into an unsigned long long
. Also use those conversion specifiers when printing their corresponding types.
My question is, why does this happen in the function main but not in the function test?
One possibility is the startup code used a little stack space while setting up the process, and this left some non-zero data in the bytes that were later used for num
in main
. The bytes lower on the stack held zero values, and these bytes were later used for num3
in test
.