c++endiannessbit-shiftunsigned-integer

C++ reading unsigned int values from byte buffer


After more than 20 years off, I started to get back into coding again, and started re-learning C++ (a language I practically never used, I used to be an Object Pascal man). So, I consider myself a complete n00b at C++.

I try to write a program that reads an entire file into a ByteStream, and then read values out of that. The (numerical) values I expect are 2-bytes or 4-bytes, unsigned, little-endian (yes, I got the basics down).

Here's my (global) variables:

std::string File_Content = "";  // entire file contents in ByteArray
unsigned __int32 File_Pos = 0;  // position in the File_Content, next Byte about to be read

Here's the function doing the reading (return Value by reference, so there's no pointless copying, because there's strings, too):

static __int8 Read2Bytes(unsigned __int32 iPos, unsigned __int8 iEndian, unsigned __int16 &iReturn)
{
   if (iEndian == 0)  // Little Endian
   { 
      // Push "second" byte to High Byte
      iReturn = File_Content[iPos] | (File_Content[iPos + 1] << 8);
   }  // end if (iEndian == 0)
   else
   {  // Big Endian, not important right now
   }  // end else (iEndian == 0)

   File_Pos += 2;  // move the Position-Marker along
   return 0;
}  // end Read2Bytes

Here's me calling the function to read a 2-byte int:

unsigned __int16 iTimeStamp = 0;      // 2 bytes
Read2Bytes(File_Pos, 0, iTimeStamp);  // Position in the FileStream, Endian, Return Value)

Now, the bytes I expect to read from the ByteStream read as 0xC6 0x9D (in that order), according to WinHex. So, after endianizing, I expect iTimeStamp to return as 0x9DC6, right?

But in fact, it returns as 0xFFC6, and for the life of me, I can't figure out why.

I have another function that is supposed to read 4 bytes, little-endian, LowWord-HighWord, and there I have the same problem.

Please, can anyone open my eyes as to why my HighBytes get lost in translation somewhere?


Edit:

I have experimented a bit to debug the problem, and tried this:

static __int8 Read2Bytes(unsigned __int32 iPos, unsigned __int8 iEndian, unsigned __int16 &iReturn)
{
   unsigned __int8 bHiByte, bLoByte;  // both bytes separately

   if (iEndian == 0)  // Little Endian
   { 
      // Not-Working Version
      //iReturn = File_Content[iPos] | (File_Content[iPos + 1] << 8);

      // new attempt, byte-by-byte
      bLoByte = File_Content[iPos];       // read "first" byte
      bHiByte = File_Content[iPos + 1];   // read "2nd" byte
      iReturn = (bHiByte << 8) | bLoByte; // endian them together

   }  // end if (iEndian == 0)

   (Rest unchanged)

and suddenly it works! From the bytes 0xC6 and 0x9D I do get 0x9DC6!

What did I do wrong with my first attempt?

This program is supposed to favor performance, and I don't think declaring 2 extra variables (including their garbage collection) is the fastest way to do this.

Or, is there a different, better way?


Solution

  • Your compiler's char type is signed. char is what you get from std::string.

    You should never shift or bit-fiddle with signed type values unless you know extremely well what you are doing. Cast to unsigned char immediately:

    iReturn = (unsigned char)File_Content[iPos] 
            | ((unsigned char)File_Content[iPos + 1] << 8);
    

    Live demo.