cgetc

How to make a Hexadecimal value input from the console for getc


As we know, if we code a string like "\x61\x61" in a c source file, it actually means "aa". When inputting a char from the console for the function of getc or fgetc, is there anyway that we just give some Hex value? Maybe something like '\x61' but not 'a'.


Solution

  • The short summary is no. But you probably want a bit more than this.

    Assuming your environment is using some superset of ASCII (which, while not required by the language, is a pretty reasonable assumption for any machine and OS from this century), "\x61\x61" is "aa". The conversion is done at compile time — if you inspect the compiler's output (for instance, by reading the assembly code it emits), you'll find aa in there, not \x61\x61. This syntax is allowed in order to let people write characters that would otherwise be invalid in a code file (the most popular example being codepoint zero, written \x00 or \000 and essentially always abbreviated into \0 (as long as it's not followed by a digit in the 0-7 range)).
    The key takeaway here is that your program does not see \x61\x61, but rather aa. You cannot recover the source representation — just like you can't tell 24, 030 and 0x18 apart.

    On the other hand, getc and friends read raw text input. They do no processing beyond newline conversion. If you want to do processing, then you have to do that in your own code. Such processing would also have to handle invalid sequences (such as \xyz) and resizing and moving the string around (because \x61 is four characters and a is one), which are problems that aren't as obvious as they might seem as first glance. Imposing this burden on all applications for the odd one that needs this specific processing would be incorrect.

    If you know you're going to be reading a hexadecimal escape sequence (and not straight characters), then you can just read hexadecimal input using scanf:

    unsigned char next;
    int rv = scanf("\\x%2hhx", &next);
    // now rv is true if a character was read, and next contains the character
    

    This approach, however, will not work for strings that mix in escape sequences, such as x\x79z. For those strings, you'll have to write an actual string processor to convert them — much like the compiler does with your code.