Prata's book Programming Language C 6th Edition says the following:
C guarantees that when it allocates space for an array, a pointer to the first location after the end of the array is a valid pointer.
1)But a pointer can have any address, no?
Example:
#include <stdio.h>
int main(void)
{
int m[10] = {0,1,2,3,4,5,6,7,8,9};
int * p = m + 100;
return 0;
}
2)This code does not produce any errors, therefore, C guarantees that the pointer can point not only to the first element after the end of the array, but also to the 100th element after the end of the array, right?
Can you quote the wording from the С Standard?
Thank you in advance!
The question asked here is different from this one, but my answer to that one largely answers this one, so I reproduce it here, with edits.
As specified by the C standard, pointers are not merely numbers or memory locations that you can manipulate freely; they have limits.
In large part, you can think of pointers as numbers, and as addresses in memory, provided (a) you understand that pointer subtraction converts the difference from bytes to elements (of the type of the pointers being subtracted), and (b) you understand the limits where this model breaks.
Per the C 2024 standard, clause 6.5.7, you may subtract two pointers that point to elements of the same array or to one past the last element of the array. (This comes from a length paragraph, so I will not quote it, the one begining “When an expression that has integer type is added to or subtracted from a pointer”.) So, if you have int a[8], b[4];
, you may subtract a pointer to a[5]
from a pointer to a[2]
, because a[5]
and a[2]
are elements in the same array. You may also subtract a pointer to a[5]
from a pointer to a[8]
, because a[8]
is one past the last element of the array. (a[8]
is not in the array; a[7]
is the last element.) You may not subtract a pointer to a[5]
from a pointer to b[2]
, because a[5]
is not in the same array as b[2]
. Or, more accurately, if you do such a subtraction, the behavior is undefined. Note that it is not merely the result that is unspecified; you cannot expect that you will get some possibly nonsensical number as a result: The behavior is undefined. According to the C standard, this means that the C standard does not say anything about what occurs as a consequence. Your program could give you a reasonable answer, or it could abort, or it could delete files, and all those consequences would be in conformance to the C standard.
If you do a defined subtraction, then the result is the number of elements from the second pointed-to element to the first pointed-to element. Thus, a[5]-a[2]
is 3, and a[2]-a[5]
is −3. This is true regardless of what type a
is. The C implementation is required to convert the distance from bytes (or whatever units it uses) into elements of the appropriate type. If a
is an array of double
of eight bytes each, then a[5]-a[2]
is 3, for 3 elements. If a
is an array of char
of one byte each, then a[5]-a[2]
is 3, for 3 elements.
Why would pointers ever not be just numbers? On some computers, especially older computers, addressing memory was more complicated. Early computers had small address spaces. When the manufacturers wanted to make bigger addresses spaces, they also wanted to maintain some compatibility with old software. They also had to implement various schemes for addressing memory, due to hardware limitations, and those schemes may have involved moving data between memory and disk or changing special registers in the processor that controlled how addresses were converted to physical memory locations. For pointers to work on machines like that, they have to contain more information than just a simple address. Because of this, the C standard does not just define pointers as addresses and let you do arithmetic on the addresses. Only a reasonable amount of pointer arithmetic is defined, and the C implementation is required to provide the necessary operations to make that arithmetic work and is not required to provide more than that.
Even on modern machines, there can be complications. On Digital’s Alpha processors, a pointer to a function does not contain the address of the function. It is the address of a descriptor of the function. That descriptor contains the address of the function, and it contains some additional information that is necessary to call the function correctly.
With regard to relational operators, such as >
, the C standard says, in 6.5.9 (“When two pointers are compared…”), that you may compare the same pointers you may subtract, as described above, and you may also compare pointers to members of an aggregate object (a struct or union). Pointers to members of an array (or its end address) compare in the expected way: Pointers to higher-indexed elements are greater than pointers to lower-indexed elements. Pointers to two members of the same union compare equal. For pointers to two members of a struct, the pointer to the member declared later is greater than the pointer to the member declared earlier.
As long as you stay within the constraints above, then you can think of pointers as numbers which are memory addresses.
Usually, it is easy for a C implementation to provide the behavior required by the C standard. Even if a computer has a compound pointer scheme, such as a base address and offset, usually all elements of an array will use the same base address as each other, and all elements of a struct will use the same base address as each other. So the compiler can simply subtract or compare the offset parts of the pointer to get the desired difference or comparison.
However, if you subtract pointers to different arrays on such a computer, you can get strange results. It is possible for the bit pattern formed by a base address and offset to appear greater (when interpreted as a single integer) than another pointer even though it points to a lower address in memory. This is one reason you must stay within the rules set by the C standard.
1)But a pointer can have any address, no?
No, when you go outside the limits specified by the C standard, the standard does not guarantee a pointer value will work.
Further, compilers take advantage of the standard’s rules to apply optimizations. If your code contains:
void MyFunction(int n, /* other stuff… */)
{
int Array[100];
void *p = Array + n;
…
}
then the compiler is allowed to deduce that n
is greater than or equal to 0 and less than or equal to 100, even if you never use p
. Then, if you have a for
loop or an if
that uses the value of n
, the optimizer might make deductions about that statement that are true given the above deduction but that are false if you pass an n
outside that bounds. So the optimizer might change the code that is generated so that it only works when the deduction is true.
This can result in your program breaking in ways that are mysterious to you.
2)This code does not produce any errors, therefore, C guarantees that the pointer can point not only to the first element after the end of the array, but also to the 100th element after the end of the array, right?
No, the fact that the compiler compiled the code without reporting any issues and the program ran without reporting any errors means only that no errors were reported. It does not mean no errors exist.