Write values of different data types in sequence in memory? or, Array with multiple data types?

I am relatively new to writing in C. I have self taught myself using what resources I have found online and in print. This is my first real project in C programming. Gotta love on-the-job training.

I am writing some code in C that is being used on a Texas Instruments C6701 Digital Signal Processor. Specifically, I am writing a set of communication functions to interface through a serial port.

The project I'm on has an existing packet protocol for sending data through the serial port. This works by handing over a pointer to the data to be transmitted and its length in bytes. All I have to do is write in the bytes to be transmitted into an "array" in memory (the transmitter copies that sequence of bytes into a buffer and transmits that).

My question pertains to how best to format the data to be transmitted, the data I have to send is composed of several different data types (unsigned char, unsigned int, float etc...). I can't expand everything up to float (or int) because I have a constrained communication bandwidth and need to keep packets as small as possible.

I originally wanted to use arrays to format the data,

unsigned char* dataTx[10];
dataTx[0]=char1;
dataTx[1]=char2;
etc...

This would work except not all my data is char, some is unsigned int or unsigned short.

To handle short and int I used bit shifting (lets ignore little-endian vs big-endian for now).

unsigned char* dataTx[10];
dataTx[0]=short1>>8;
dataTx[1]=short1;
dataTx[2]=int1>>24;
dataTx[3]=int1>>16;
etc...

However, I believe another (and better?) way to do this is to use pointers and pointer arithmetic.

unsigned char* dataTx[10]
*(dataTx+0) = int1;
*(dataTx+4) = short1;
*(dataTx+6) = char1;
etc...

My question (finally) is, is which method (bit shifting or pointer arithmetic) is the more acceptable method? Also, is one faster to run? (I also have run-time constraints).

My requirement: The data be located in memory serially, without gaps, breaks or padding.

I don't know enough about structures yet to know if a structure would work as a solution. Specifically, I don't know if a structure always allocates memory locations serially and without breaks. I read something that indicates they allocates in 8 byte blocks, and possibly introduce padding bytes.

Right now I'm leaning towards the pointer method. Thanks for reading this far into what seems to be a long post.

Solution

Usually you would use the bit shifting approach, because many chips do not allow you to copy, for example, a 4-byte integer to an odd byte address (or, more accurately, to a set of 4 bytes starting at an odd byte address). This is called alignment. If portability is an issue, or if your DSP does not allow misaligned access, then shifting is necessary. If your DSP incurs a significant performance hit for misaligned access, you might worry about it.

However, I would not write the code with the shifts for the different types done longhand as shown. I would expect to use functions (possibly inline) or macros to handle both the serialization and deserialization of the data. For example:

unsigned char dataTx[1024];
unsigned char *dst = dataTx;

dst += st_int2(short1, dst);
dst += st_int4(int1, dst);
dst += st_char(str, len, dst);
...

In function form, these functions might be:

size_t st_int2(uint16_t value, unsigned char *dst)
{
    *dst++ = (value >> 8) & 0xFF;
    *dst   = value & 0xFF;
    return 2;
}

size_t st_int4(uint32_t value, unsigned char *dst)
{
    *dst++ = (value >> 24) & 0xFF;
    *dst++ = (value >> 16) & 0xFF;
    *dst++ = (value >>  8) & 0xFF;
    *dst   = value & 0xFF;
    return 4;
}

size_t st_char(unsigned char *str, size_t len, unsigned char *dst)
{
    memmove(dst, str, len);
    return len;
}

Granted, such functions make the code boring; on the other hand, they reduce the chance for mistakes too. You can decide whether the names should be st_uint2() instead of st_int2() -- and, indeed, you can decide whether the lengths should be in bytes (as here) or in bits (as in the parameter types). As long as you're consistent and boring, you can do as you will. You can also combine these functions into bigger ones that package entire data structures.

The masking operations (& 0xFF) may not be necessary with modern compilers. Once upon a very long time ago, I seem to remember that they were necessary to avoid occasional problems with some compilers on some platforms (so, I have code dating back to the 1980s that include such masking operations). Said platforms have probably gone to rest in peace, so it may be pure paranoia on my part that they're (still) there.

Note that these functions are passing the data in big-endian order. The functions can be used 'as is' on both big-endian and little-endian machines, and the data will be interpreted correctly on both types, so you can have diverse hardware talking over the wire, using this code, and there will be no miscommunication. If you have floating point values to convey, you have to worry a bit more about the representations over the wire. Nevertheless, you should probably aim to have the data transferred in a platform-neutral format so that interworking between chip types is as simple as possible. (This is also why I used the type sizes with numbers in them; 'int' and 'long' in particular can mean different things on different platforms, but 4-byte signed integer remains a 4-byte signed integer, even if you are unlucky - or lucky - enough to have a machine with 8-byte integers.)