After more than 20 years off, I started to get back into coding again, and started re-learning C++ (a language I practically never used, I used to be an Object Pascal man). So, I consider myself a complete n00b at C++.
I try to write a program that reads an entire file into a ByteStream
, and then read values out of that. The (numerical) values I expect are 2-bytes or 4-bytes, unsigned, little-endian (yes, I got the basics down).
Here's my (global) variables:
std::string File_Content = ""; // entire file contents in ByteArray
unsigned __int32 File_Pos = 0; // position in the File_Content, next Byte about to be read
Here's the function doing the reading (return Value by reference, so there's no pointless copying, because there's strings, too):
static __int8 Read2Bytes(unsigned __int32 iPos, unsigned __int8 iEndian, unsigned __int16 &iReturn)
{
if (iEndian == 0) // Little Endian
{
// Push "second" byte to High Byte
iReturn = File_Content[iPos] | (File_Content[iPos + 1] << 8);
} // end if (iEndian == 0)
else
{ // Big Endian, not important right now
} // end else (iEndian == 0)
File_Pos += 2; // move the Position-Marker along
return 0;
} // end Read2Bytes
Here's me calling the function to read a 2-byte int:
unsigned __int16 iTimeStamp = 0; // 2 bytes
Read2Bytes(File_Pos, 0, iTimeStamp); // Position in the FileStream, Endian, Return Value)
Now, the bytes I expect to read from the ByteStream
read as 0xC6 0x9D
(in that order), according to WinHex. So, after endianizing, I expect iTimeStamp
to return as 0x9DC6
, right?
But in fact, it returns as 0xFFC6
, and for the life of me, I can't figure out why.
I have another function that is supposed to read 4 bytes, little-endian, LowWord-HighWord, and there I have the same problem.
Please, can anyone open my eyes as to why my HighBytes
get lost in translation somewhere?
Edit:
I have experimented a bit to debug the problem, and tried this:
static __int8 Read2Bytes(unsigned __int32 iPos, unsigned __int8 iEndian, unsigned __int16 &iReturn)
{
unsigned __int8 bHiByte, bLoByte; // both bytes separately
if (iEndian == 0) // Little Endian
{
// Not-Working Version
//iReturn = File_Content[iPos] | (File_Content[iPos + 1] << 8);
// new attempt, byte-by-byte
bLoByte = File_Content[iPos]; // read "first" byte
bHiByte = File_Content[iPos + 1]; // read "2nd" byte
iReturn = (bHiByte << 8) | bLoByte; // endian them together
} // end if (iEndian == 0)
(Rest unchanged)
and suddenly it works! From the bytes 0xC6
and 0x9D
I do get 0x9DC6
!
What did I do wrong with my first attempt?
This program is supposed to favor performance, and I don't think declaring 2 extra variables (including their garbage collection) is the fastest way to do this.
Or, is there a different, better way?
Your compiler's char
type is signed. char
is what you get from std::string
.
You should never shift or bit-fiddle with signed type values unless you know extremely well what you are doing. Cast to unsigned char
immediately:
iReturn = (unsigned char)File_Content[iPos]
| ((unsigned char)File_Content[iPos + 1] << 8);