I was doing a CTF reversing challenge when I came across this C code in Ghidra:
int main(void)
{
int iVar1;
char input[32];
fwrite("Password: ",1,10,stdout);
__isoc99_scanf("DoYouEven%sCTF",input);
iVar1 = strcmp(input,"__dso_handle");
if ((-1 < iVar1) && (iVar1 = strcmp(input,"__dso_handle"), iVar1 < 1)) {
printf("Try again!");
return 0;
}
iVar1 = strcmp(input,"_init");
if (iVar1 == 0) {
printf("Correct!");
}
else {
printf("Try again!");
}
return 0;
}
And upon making my own similar code, I noticed that the program only saves in input
when my input starts with DoYouEven
and only saves whatever comes after it. And I am trying to understand the reasoning for this from the source code of scanf.c
and vfscanf.c
, but I am unable to understand the actual logic behind this.
My snippet:
int main()
{
char input[32];
printf("Input: ");
int a = scanf("DoYouEven%sCTF",input);
printf("Input was: %s\n", input);
int iVar = strcmp(input,"__dsohandle");
printf("iVar: %d\n", iVar);
printf("strcmp __dsohandle > -1: %d\n", (-1 < iVar));
printf("strcmp __dso_handle: %d\n", iVar);
printf("strcmp _init: %d\n", strcmp(input,"_init"));
printf("%d\n",a);
return 0;
}
Output:
Enter your input: DoYouEvenByee
input is: Byee
iVar: -29
strcmp __dsohandle > -1: 0
strcmp __dso_handle: -29
strcmp _init: -29
1
Can anybody help me understand this through source code? I am not a master at understanding C libraries.
To build upon @dbush answer, here is a more detailed sequence of the behavior of scanf("DoYouEven%sCTF", input)
:
scanf
tries to match the format string reading one byte at a time
at the end of the format string it returns the number of successful conversions, which by definition is the number of conversions if the end of the format string is reached: 1
in this case.
for any character in the format string that is neither whitespace nor a %
, scanf
reads the next byte from the stream and:
2a) if end of file is reached or a read error occurs, scanf
returns the number of successful conversions so far or EOF
if none has been tried yet.
2b) if the byte matches the character, the process continues at step 1
2c) otherwise, the match fails the byte is pushed back into the stream (as if by ungetc()
) and the number of successful conversions so far is returned: if the user input does not start with DoYouEven
, 0
is returned.
for any whitespace character (eg: ' '
, '\t'
, '\n'
...) scanf
reads the stream and consumes any whitespace byte, not necessarily the same as the format string character. Other input is processed either as in 2a or in 2c.
if the character is a %
, the following characters are interpreted as a conversion specifier and the conversion is attempted:
%s
, scanf
will retrieve the next argument passed and this argument must be a pointer to a modifiable array of char
. It then reads and discards any white space from stdin
, then if end-of-file or a read-error has occurred, this is a conversion failure handled as in 2a. Otherwise it reads and stores any non whitespace bytes read from stdin
into the array pointed to by the pointer. This will stop on the first whitespace byte read from the stream (which will be pushed back) or end-of-file or read-error. A null terminator is stored after all the bytes stored, the number of successful conversions is incremented and the process continues at 1.These steps imply 2 problems for the specific call:
the number of bytes stored into input
is not limited so any user input with more than 31 non whitespace characters after DoYouEven
will invoke undefined behavior as bytes will be stored beyond the end of the array, overwriting other data... Try it for yourself. This is a typical misuse of scanf()
that can often be exploited by hackers. To fix this potential security flaw, write: scanf("DoYouEven%31sCTF", input)
the characters following the %s
, CTF
, cannot be matched by user input because scanf()
only stops converting the string at EOF
or whitespace which will be read next, and will not match the C
.
the return value of scanf
is not checked: if it differs from 1
, the user input was not stored into input
which remains uninitialized. Comparing the contents of this array with strcmp
invokes undefined behavior.
Note also other problems in this code:
stdout
should be flushed with fflush(stdout)
before attempting to read from stdin
with scanf()
to ensure the prompt is visible to the user. Most C libraries flush stdout
automatically upon reading from stdin
when these streams are connected to a terminal, but this behavior is not mandatory so it is better to force the flush explicitly.
Gidra's output code is somewhat contorted:
iVar1 = strcmp(input,"__dso_handle");
if ((-1 < iVar1) && (iVar1 = strcmp(input,"__dso_handle"), iVar1 < 1)) {
-1 < iVar1
is just an obfuscated way of testing if iVar >= 0
strcmp
a second time with the same arguments, might return a different value, but with the same signedness. This seems to indicate the original code has 2 calls to strcmp
, very intriguing.((iVar1 = strcmp(input,"__dso_handle") < 1)
iVar1 < 1
is much less readable than the equivalent iVar1 <= 0
.The whole test can be simplified as if (iVar1 >= 0 && iVar1 <= 0)
which itself is equivalent to if (iVar1 == 0)
in other words if the string typed by the user is exactly DoYouEven__dso_handle
. Given the next test, comparing user input to "__dso_handle"
is entirely redundant and can be removed.
The goal of this reverse engineering session is to show the security flaw in the call to scanf()
that can be exploited.