An MRE will be tricky here because this relies on (a very simple) secondary terminal executable, but I'll try my best:
The current setup code is
SECURITY_ATTRIBUTES attrs;
attrs.nLength = sizeof(SECURITY_ATTRIBUTES);
attrs.bInheritHandle = TRUE;
attrs.lpSecurityDescriptor = NULL;
STARTUPINFO startup;
ZeroMemory(&startup, sizeof(startup));
startup.cb = sizeof(startup);
startup.dwFlags = STARTF_USESTDHANDLES;
HANDLE h_read, h_write, h_parent, h_child;
CreatePipe(&h_read, &h_write, &attrs, 0ul);
h_parent = h_write;
h_child = h_read;
SetHandleInformation(h_parent, HANDLE_FLAG_INHERIT, FALSE);
startup.hStdInput = h_child;
int fd = _open_osfhandle((long)h_parent, _O_TEXT | _O_WRONLY);
FILE *stdin_ = _fdopen(fd, "wt");
setvbuf(stdin_, NULL, _IOLBF, 1024);
CreatePipe(&h_read, &h_write, &attrs, 0ul);
h_parent = h_read;
h_child = h_write;
SetHandleInformation(h_parent, HANDLE_FLAG_INHERIT, FALSE);
startup.hStdOutput = h_child;
fd = _open_osfhandle((long)h_parent, _O_TEXT | _O_RDONLY);
FILE *stdout_ = _fdopen(fd, "rt");
setvbuf(stdout_, NULL, _IOLBF, 1024);
CreatePipe(&h_read, &h_write, &attrs, 0ul);
h_parent = h_read;
h_child = h_write;
SetHandleInformation(h_parent, HANDLE_FLAG_INHERIT, FALSE);
startup.hStdError = h_child;
fd = _open_osfhandle((long)h_parent, _O_TEXT | _O_RDONLY);
FILE *stderr_ = _fdopen(fd, "rt");
setvbuf(stderr_, NULL, _IOLBF, 1024);
char args[] = "surrogate.exe"; // cannot be const
PROCESS_INFORMATION info;
ZeroMemory(&info, sizeof(info));
CreateProcess(
NULL, // lpApplicationName
args, // lpCommandLine
NULL, // lpProcessAttributes
NULL, // lpThreadAttributes
TRUE, // bInheritHandles: (only the child-side) handles are inherited
CREATE_NO_WINDOW, // dwCreationFlags
NULL, // lpEnvironment: use parent's environment
NULL, // lpCurrentDirectory: use parent's current directory
&startup, // lpStartupInfo
&info // lpProcessInformation
);
CloseHandle(info.hThread);
The surrogate should be able to reproduce the situation by printing the following to stdout -
File header v1.0
Execution: 30
but with interspersed flushes every eight bytes.
The setup works for the code I'll show below. If I convert the setvbuf
calls to use _IONBF
it does not work for the code I'll show below.
// Expect and discard string "File header v1.0"
fscanf(stdout_, "File header v1.0\n");
// Expect and discard "Execution: ", expect integer '30' parsed
int time_flag;
fscanf(stdout_, "Execution: %d", &time_flag);
In both buffering modes, all pipes open successfully and I get some content.
In line buffering mode (which is likely coerced to full-buffered mode), all parsing works up to and including the time flag.
In non-buffered mode, the fscanf
call does not block to read the entire format string; it performs a partial read of 8 bytes - not even on a whitespace boundary. I know this is the case because if I substitute the second fscanf
with
char next[256];
fscanf(stdout_, "%s", next);
I see a fragmentary n:
in next
. This leads me to the following questions:
OVERLAPPED
anywhere, I would have assumed that this pipe I/O is fully synchronous. Does the API standard actually bear this out or not?fscanf
offer a guarantee of synchronous, blocking behaviour? cppreference only says (emphasis mine)For every conversion specifier other than
n
, the longest sequence of input characters which does not exceed any specified field width and which either is exactly what the conversion specifier expects or is a prefix of a sequence it would expect, is what's consumed from the stream.
I cannot tell what longest sequence implies. The language around "prefix" is somewhat surprising - if that's true and fscanf
might arbitrarily bail without processing the entire format string, then that might be my problem (?)
FILE*
level to consider the stream blocking, and to avoid early-returning on scan of partially-available data?I will reiterate that if this seems like a bug in my version of the CRT and is non-reproducible with a standards-adhering CRT, that's
Does setvbuf(_IONBF) disable pipe blocking?
setvbuf()
affects the buffering of the I/O stream passed to it. In your case, this is not the pipe itself, but rather a stream wrapped around it by _fdopen()
.
But I guess you're asking whether you should expect setting the stream to unbuffered mode to affect how fscanf()
matches data from the stream, and in particular, whether it it is consistent for fscanf()
to match (or fail to do) the same data differently when the stream is unbuffered than it does when the stream is buffered.
Bottom line: the C language specification provides no reason to expect such a difference, and I find it surprising. Your observations could nevertheless be construed as conforming to spec, subject to some additional provisos.
The C stdio functions do not have a sense of non-blocking I/O (not to be confused with unbuffered I/O), and the language spec contains no provisions for specifying such a mode. It certainly does not say that streams operating in unbuffered mode exhibit non-blocking characteristics, or that using setvbuf()
to put a stream into unbuffered mode would engage non-blocking behavior. But neither does it say that that does not happen.
C does not preclude interactions with a stream from encountering transient errors, such as might arise from lower-level non-blocking I/O operations against a pipe underlying the stream. If such a condition were construed by the runtime as a read error, however, then spec conformance would demand that the runtime set the stream's error indicator. This arises from C99 7.19.3/11:
The byte input functions read characters from the stream as if by successive calls to the
fgetc
function.
in light of C99 7.19.7.1/3:
If a read error occurs, the error indicator for the stream is set and the
fgetc
function returnsEOF
.
Once a file's error indicator is set, the only way the spec provides for it to be cleared is the rewind()
function. However, unlike the end-of-file indicator, the error indicator can be informational only. That is, nothing in the spec obligates the stdio functions to interpret the error indicator being set as constituting an I/O error in itself. Although the indicator should not be reset on a successful read, it is not inconsistent with the spec for successful reads to be performed from a stream whose error indicator is set.
Now as for fscanf()
in particular,
The
fscanf
function executes each directive of the format in turn. If a directive fails, as detailed below, the function returns. Failures are described as input failures (due to the occurrence of an encoding error or the unavailability of input characters), or matching failures (due to inappropriate input).[...]
A directive that is an ordinary multibyte character is executed by reading the next characters of the stream. If any of those characters differ from the ones composing the directive, the directive fails and the differing and subsequent characters remain unread. Similarly, if end-of-file, an encoding error, or a read error prevents a character from being read, the directive fails.
(C99 7.19.6.2/4,6)
That applies to each literal, non-whitespace character in your format individually.
fscanf()
is specified to work within the limitations of C's stream abstraction, which does not assume that all streams can be repositioned arbitrarily, and which does not guarantee capacity for more than one byte of pushback. When a read error occurs, there is no current character to push back, and the previously-read characters are beyond pushback, so whatever was previously consumed from the stream remains consumed.
So yes, fscanf()
might fail after having read data matching only some of your format, leaving other data matching subsequent parts of the format waiting to be read. In your particular case, that would require encountering (something it construes as) a read error. This would require the stream's error indicator to be set. Also, fscanf
should categorize it as an "input failure", and the specs say
The
fscanf
function returns the value of the macro EOF if an input failure occurs before any conversion.
(C99 7.19.6.2/16)
Note that "conversion" is performed in response to "conversion specifications" in the format, such as your %d
. Matching literal characters or whitespace does not count as conversion, so the above is what applies to the behavior you describe.
If your first scanf()
call indeed does return EOF
, and if afterward, ferror(stdin_)
returns nonzero and feof(stdin_)
returns 0, then it is consistent with the spec for it to have consumed only part of the initial "File header v1.0", or to have consumed all of it but left one or more whitespace at the head of the stream.
Your second scanf()
call could, similarly, consume part of "Execution:", leaving the tail of that line waiting to be read, provided that it returns EOF, and that when it returns, the stream's error indicator is set and its eof indicator is unset.
This is independent of the provenance of the stream or the nature of the perceived error.
Does
fscanf
offer a guarantee of synchronous, blocking behaviour?
Not as such. However, the only asynchronous behavior recognized by C99 is signal handling, and C99 does not, itself, have any sense of non-blocking I/O. Moreover, any such exposure encountered in practice would be coming from below fscanf()
, at the underlying implementation of the stream abstraction. I would account a C implementation or third-party library that surprised programmers with exposure to asynchronous or non-blocking I/O as being of very poor quality in that regard.
On the other hand, you might have been less surprised if you were more careful, by checking the return value of fscanf()
, and by checking ferror()
and / or feof()
when that was warranted. And you might have been better off avoiding fscanf()
altogether. It is much more difficult to use safely than typical introductory courses in the C language let on. One of the more common recommendations is to use fgets()
combined with sscanf()
instead. Especially so when you are reading line-oriented input. In this case, I speculate that that might still get you partial reads, but then you would be in a better position to see what was going on and to (programmatically) adapt to it.
cppreference only says (emphasis mine)
For every conversion specifier other than
n
, the longest sequence of input characters which does not exceed any specified field width and which either is exactly what the conversion specifier expects or is a prefix of a sequence it would expect, is what's consumed from the stream.I cannot tell what longest sequence implies.
It implies that fscanf
keeps reading characters until it exhausts the field width or finds one that is inconsistent with the conversion specification it is processing. In the latter case, it pushes back that one inconsistent character. For example, if it is processing %4d and the leading characters in the stream are "12345", then it will consume and successfully convert "1234". Or if the leading characters where "--12345" then it would read and consume the first '-', a prefix of a matching sequence, but then have a matching failure at the second '-', which would be pushed back into the stream.
Is there a standard way to explicitly force the CRT at either the integer file descriptor or
FILE*
level to consider the stream blocking, and to avoid early-returning on scan of partially-available data?
Integer file descriptors are not addressed by the C language specification. They are a POSIX thing, also supported to one extent or another by some other C implementations, but there is not standard way to do anything with them, if by "standard" you're referring to the C language spec. POSIX does have ways to enable and disable non-blocking mode, but it's unclear what, if any, of that is available to you, plus POSIX files are opened in blocking-mode by default.
Additionally, I observe that although C does provide for setting up streams for unbuffered reading, it's very unusual to do so. Unbuffered writing has considerably more applications, whereas unbuffered reading has non-trivial complications. I'm inclined to suspect that the (mis)behavior you observe is related to faulty handling of unbuffered input, rather than poor-quality handling of non-blocking input.