arrayscallocationpointer-arithmetic

What are use cases for writing (&var + 1) if var is not an array element?


Recently I learned from user "chux" that it is legal to add 1 to an address that doesn't represent an array element. Specifically, the following provision in the standard (C17 draft, 6.5.6 ¶7)

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

makes it legal to write &var + 1 where var is not representable as arr[i] for some T arr[n] where 0i<n.

What are use cases for doing this? I found an example by Aaron Ballman (on the SEI CERT C Coding Standard website) who mentions "allocation locality". Without quoting his entire example, the essence seems to be that one can allocate space for multiple objects using a single call to malloc, so that one can assign to them like this:

T1 *objptr1 = (T1 *)malloc(sizeof(T1) + sizeof(*objptr2));
*objptr1 = ...;
memcpy(objptr1 + 1, objptr2, sizeof(*objptr2))

Here is a toy example of mine:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
    float obj2 = 432.1;
    long *objptr1 = (long *)malloc(sizeof(*objptr1) + sizeof(obj2));
    *objptr1 = 123456789L;
    memcpy(objptr1 + 1, &obj2, sizeof(obj2));

    printf("%ld\n", *objptr1); // 123456789
    printf("%f\n", *(float *)(objptr1 + 1)); // 432.100006

    return 0;
}

I hope that this captures the essence of the idiom. (Perhaps it does not: As a commenter pointed out, my toy example assumes that the alignment of float is smaller than or equal to the alignment of long. The original example by Aaron Ballman had a string as the second object, and strings can be arbitrarily aligned. For a correct minimal (toy) version of Aaron Ballman's code stub see my own answer here.)

However, it seems that one could also simply use a (char *)-cast with sizeof instead:

    memcpy((char *)objptr1 + sizeof(*objptr1), &obj2, sizeof(obj2));

In the general case, &var + 1 is shorter than (char *)&var + sizeof var, so perhaps this is the advantage.

But is that all? What are use cases for writing (&var + 1) if var is not an array element?


Solution

  • What are use cases for writing (&var + 1) if var is not an array element?

    Not everything that falls out of the language semantics has a specific use. Most computer languages are designed for consistency and sufficiency. Some also aim for simplicity. Few, however, expressly target minimality, and C is not one of them.

    The primary reason that pointer arithmetic is defined for pointers to scalars is that it makes it easier to define pointer arithmetic. Pointers to scalars are not a special case, which is good, because it's not necessarily possible to distinguish them from pointers to array elements (alternatively: implementations don't need to make that possible). Furthermore, making pointers to scalars equivalent to pointers to the single element of a one-element array is unproblematic, because the pointer types are the same and the representation of a scalar is identical to the representation of a one-element array of the same data type.

    Given that pointer arithmetic is defined for pointers to scalars by relying on a semantic equivalence between scalars and single-element arrays, the use cases for &scalar + 1 are exactly the same as those for &single_element_array[0] + 1, in contexts where one wants to lean on that semantic equivalence. In turn, those cases are pretty much the same as the ones for &n_element_array[n-1] + 1 generally.

    Perhaps a better question, then, would be why the language allows computing a pointer to just past the end of an array, and what use that might have. As far as I am aware or have ever been able to determine, those are primarily a matter of convenience. For example, it is easier to iterate over an array via pointers if you are permitted to compute (but not dereference) a pointer to just past the end of the array. And it is desirable to be able to express sub-arrays via an [inclusive_start, exclusive_end) pointer pair. Neither of those things is essential, however.