cprintfwchar-tformat-specifierswidechar

behaviour of fprintf when 'ls' format specifier is used with precision set for printing string of wide characters


In the C standard's C17 draft N2176 document, at 7.21.6.1.8, it says that

If no precision is specified, the array shall contain a null wide character. If a precision is specified, no more than that many bytes are written (including shift sequences, if any), and the array shall contain a null wide character if, to equal the multibyte character sequence length given by the precision, the function would need to access a wide character one past the end of the array. In no case is a partial multibyte character written.

what does this mean?

Suppose I have an array of 2 wchar_t characters, containing two valid non-null wide characters. If I set the precision to 8, assuming wchar_t size being 4 bytes, what is the behaviour of this print request? Does it print those 2 characters? Or is this behaviour undefined as there is no third character with value being a null wide character?


chatGPT summary

I'm having difficulty in understanding the last few statements of the 7.21.6.1.8, so I asked chatGPT to summarize 7.21.6.1.8. Here's the summary:

When an l length modifier is used with a format specifier (e.g., %ls), the argument must be a pointer to an array of wchar_t. Here's what happens:

  • Conversion: Each wide character in the array is converted to a multibyte character using the wcrtomb function. The conversion starts with a reset state (mbstate_t initialized to zero).
  • Termination: Conversion continues up to and includes the null wide character (L'\0'), and the resulting multibyte string does not include its null terminator.
  • Precision:
    • If no precision is specified, the array must contain a null wide character to mark the end.
    • If precision is specified, at most that many bytes (including any shift sequences) are written. If this limit requires accessing one character past the array’s end, the array must still include a null wide character.
  • Partial Characters: A partial multibyte character is never written.

Is this summary correct, especially the 2nd point under precision section? If yes, please give me a 'correct' and an 'incorrect' example code according to this statement: "If this limit requires accessing one character past the array’s end, the array must still include a null wide character."


Solution

  • The optional precision field is a limit on the number of bytes produced in the output for the given source string. Only as many wchar_t elements from the source array are accessed as necessary to produce the output up to this limit. In your example, if the 2 wide characters produce 8 bytes in the output, exactly the value of the specified precision, printf will not attempt to read a third wide character from the source, hence there is no need for a null terminator and you can pass an array of 2 wchar_t.

    Relying on this is risky as depending on the locale and the actual wide character values, the number of bytes can vary, so it is always safer to pass null terminated strings to printf (wide or not) use the precision field to limit the output. It is a bit counter intuitive for the precision field for wide strings to specify a number of bytes after conversion instead of a number of wide characters from the source, but a number of bytes after conversion is consistent with the usage of the width field and allows proper truncation of the output.

    Note also that the C Standard is not fully explicit about the case where the bytes produced by the conversion of the current wide character exceed the available space specified by precision: these bytes are not stored at all, but does printf stop converting the current string or does it continue with the next wide character until it either fills the available space or reaches a null terminator? I would certainly consider the latter an error.