cprintfundefined-behaviorformat-stringrestrict-qualifier

What happens if memory for a format string is shared with one of the arguments of printf?


According to the C Standard, the signature of printf() is:

int printf(const char * restrict format, ...);

As I understand, the meaning of the restrict is that format will be the only reference to the pointed-to data within the lifetime of the pointer. This enables optimizations for reasons I do not fully grasp. But does this mean that undefined behavior is invoked if I reuse memory for the format string as an argument? Even though, as far as I can tell, the format string is not required to be a string literal?

static const char str[] = "%sHello\n";
printf(str, str + 2); // Hello\nHello\n or UB?

I know the implementation might reuse the memory for identical or identically-ending string literals:

"foo" + 1 == "oo"; // Might be true

Does that mean that the following:

printf("%sHello\n", "Hello\n");

Might behave in a nonsensical manner if an implementation makes both string literals share memory, thus violating the restrict constraint?


Solution

  • As I understand, the meaning of the restrict is that format will be the only reference to the pointed-to data within the lifetime of the pointer.

    Not quite. It means that if the format string1 is modified by any means (through format or otherwise), then printf must access it only through other expressions based on format (and therefore the caller must not pass another argument that would result in printf accessing the format string through that argument), per C 2018 6.7.3.1. If nothing in the format string is going to be modified during the printf call, it does not matter what other pointers to it there are.

    I think the only way this can matter is if you use the n conversion specifier, which says the corresponding argument is a pointer to a signed integer into which is written the number of character written to the output stream so far by this call to printf. So, if the format were not restrict-qualified you could write these two printf calls:

    int n;
    memcpy(&n, "x%n", 4);
    printf((char *) &n, &n);  // Would store 1 in `n`, since “x” had been written.
    
    const char String[] = "Hello, world.\n%n";
    int *p = malloc(sizeof String);
    memcpy(p, String, sizeof String);
    printf(p, p);  // Would store int `14` at `p`.
    

    The former would be legal because any object can be read through a character type, which printf presumably does in effect. The latter is legal because the effective type of dynamic memory is malleable, so writing to p as an int after it has been used as a character string is defined.

    Obviously, it would be problematic if the %n caused a write to memory that printf was possibly yet to use for the format string. Again, this is in the context where we assume the string is not qualified with restrict. Given that it is, the behavior of using %n in this way would not be defined.

    (n can also be used with modifiers, as in %hhn to write a char instead of an int, but this does not affect the above analysis.)

    Footnote

    1 Technically, any object based on format, which is essentially all the elements of the array object that format points to (which includes earlier elements in the array if format points into the middle of an array).