I am trying to read in with C# a file written with CArchive. From what I can tell the format is:
[length of next set of data][data]...etc
I'm still fuzzy on some of the data, though. How do I read in Date data? What about floats, ints, doubles, etc?
Also, [length of next set of data] could be a byte or word or dword. How do I know when it will be each? For instance, for a string "1.10" the data is:
04 31 2e 31 30
The 04
is the length, obviously and the rest are hex values for 1.10. Trivial. Later I have a string that is 41 characters long, but the [length] value is:
00 00 00 29
Why 4 bytes for the length? (0x29 = 41)
The main question is: Is there a spec for the format of CArchive output?
To answer your question about strings, the length value that is stored in the archive is itself variable-length depending on the length and encoding of its string. If the string is < 255
characters, one byte is used for the length. If the string is 255 - 65534
characters, 3 bytes are used - a 1-byte 0xFF
marker followed by a 2-byte word. If the string is 65535+
characters, 7 bytes are used - a 3-byte 0xFF 0xFF 0xFF
marker followed by a 4-byte dword. To make it even more complicated, if the string is Unicode encoded, the length value is preceeded by a 3-byte 0xFF 0xFFFE
marker. So in any, combination, you will never see a 4-byte length by itself, so what you showed has to be 3 0x00
bytes belonging to something else, followed by a 1-byte string length 0x29
.
So, the correct way to read a string is as follows:
Assume: string data is Ansi unless told otherwise.
Read a byte. If its value is < 255, string length is the value, goto 3.
Read a word. If its value is 0xFFFE
, string data is Unicode, goto 1. Otherwise, if its value is < 65535, string length is its value, goto 3. Otherwise, read a dword, string length is its value, goto 3.
read string length number of 8bit or 16bit values, depending on whether string is Ansi or Unicode, and then convert to desired encoding as needed.