c++windowsfile-ioposix-api

read() returns the wrong number of bytes read on some systems


I'm trying to solve a file reading issue in a legacy system.

It's a 32bit windows application tested and run only on Windows7/SP1/64bit Systems which all have the same SP's, SDK's and IDE's installed. IDE is VS2010/SP1.

Here's the code in question:

#define ANZSEL 20

int ii, bfil, ipos;

if ((bfil = open("Z:\\whatever.bla", O_RDONLY, 0)) == -1)  { goto end; } // please don't complain about this; it's just here because I didn't want to rephrase the if == -1 above and because it's a legacy codebase; i also tried with UNC paths by the way with the same result

   ii = read(bfil, &some_struct_instance, sizeof(some_struct));
   ipos = _lseek(bfil,0,SEEK_CUR); // ipos shows the correct position here, ie. sizeof(some_struct)
   if (ii == sizeof(some_struct))  {

      ii = read(bfil, &another_struct_instance, sizeof(another_struct)*ANZSEL); // ii here sometimes shows 15 instead of sizeof(another_struct)*ANZSEL
      ipos = _lseek(bfil,0,SEEK_CUR); // ipos always shows the correct value of sizeof(some_struct) + sizeof(another_struct)*ANZSEL
      if (ii == sizeof(another_struct)*ANZSEL)  {

         // should always come here as long as the files' long enough

So as you can see, it should be a plain old direct binary read into some structs. What i could observe is that when i create the file and first clear the struct with a memset/Zeromem to also "init" all padding-bytes with 0x00 instead of 0xCC (which is microsoft's way of tagging the mem in debug mode as non initialized stack mem) the problem disappears on the system where it didn't behave correctly before.

Although it seems clear to me how I can "properly" solve the issue - specify O_BINARY in open() like

if ((bfil = open("Z:\\whatever.bla", O_RDONLY|O_BINARY, 0)) == -1)

i don't have any clue about why this can behave so differently. I tried to step through the sources of open() and read() on both systems, but since I rarely have access to the only system where the problem can be reproduced, i couldn't find anything out yet.

my question therefore is if anyone can point out why this happens and reference some docs.


Solution

  • This typically happens when a file contains the value 0x1a (aka control-Z). Like MS-DOS before it, Windows interprets control-Z as signaling the end of a text file, so when you open a file in text mode, and it reaches a 0x1a, it'll simply stop reading.

    As you've already found, opening the file in binary mode fixes the problem--the 0x1a is no longer interpreted as signaling the end of file.