I've spent my spare time doing fun things for my personal joy in C these days....
But, I ended up finding out something interesting to me. I do not know why this result is happening until now..
max_arr_count_index
is assigned depending on arr[5]
value, which is past the end of the array +1.
Is there someone who can explain this to me? I know it should not be. I assigned the value the past one index of the array (here, arr[5] = 30 in the problem case) and it's not safe, and it is undefined behavior as defined by the standard.
I am not gonna do the same thing in the real field, But, I just want to get more under the hood here.
Clang and GCC have given me the same result.
Code and result is below:
[No Problem case: I do not assign the value past end of the index]
#include <stdio.h>
int arr[] = {11,33,55,77,88};
int max_arr_count_index = (sizeof(arr) / sizeof(arr[0]));
// print all
void print_all_arr(int* arr)
{
// just print all arr datas regarding index.
for(int i = 0; i < max_arr_count_index; i++) {
printf("arr[%d] = %d \n", i, arr[i]);
}
}
int main(int argc, const char * argv[]) {
// insert code here...
printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
printf("[before]The original array elements are :\n");
print_all_arr(arr);
arr[0] = 1;
arr[1] = 2;
arr[2] = 3;
arr[3] = 4;
arr[4] = 5;
// arr[5] = 1000;
printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
printf("[after]The array elements after :\n");
print_all_arr(arr);
return 0;
}
No problem result is below:
[before]max_arr_count_index : 5
[before]The original array elements are :
arr[0] = 11
arr[1] = 33
arr[2] = 55
arr[3] = 77
arr[4] = 88
[after]max_arr_count_index : 5
[after]The array elements after :
arr[0] = 1
arr[1] = 2
arr[2] = 3
arr[3] = 4
arr[4] = 5
Program ended with exit code: 0
[Problem case: I assigned the value past end of the index]
#include <stdio.h>
int arr[] = {11,33,55,77,88};
int max_arr_count_index = (sizeof(arr) / sizeof(arr[0]));
// print all
void print_all_arr(int* arr)
{
// just print all arr datas regarding index.
for(int i = 0; i < max_arr_count_index; i++) {
printf("arr[%d] = %d \n", i, arr[i]);
}
}
int main(int argc, const char * argv[]) {
// insert code here...
printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
printf("[before]The original array elements are :\n");
print_all_arr(arr);
arr[0] = 1;
arr[1] = 2;
arr[2] = 3;
arr[3] = 4;
arr[4] = 5;
/* Point is this one.
If I assign arr[5] 30, then, max_arr_count_index is changed also as
30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
*/
arr[5] = 30;
/* Point is this one.
If I assign arr[5] 30, then, max_arr_count_index is changed also as
30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
*/
printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
printf("[after]The array elements after arr[5] is assigned 30 :\n");
print_all_arr(arr);
return 0;
}
Result is below :
[before]max_arr_count_index : 5
[before]The original array elements are :
arr[0] = 11
arr[1] = 33
arr[2] = 55
arr[3] = 77
arr[4] = 88
[after]max_arr_count_index : 30
[after]The array elements after arr[5] is assigned 30 :
arr[0] = 1
arr[1] = 2
arr[2] = 3
arr[3] = 4
arr[4] = 5
arr[5] = 30
arr[6] = 0
arr[7] = 0
arr[8] = 0
arr[9] = 0
arr[10] = 0
arr[11] = 0
arr[12] = 0
arr[13] = 0
arr[14] = 0
arr[15] = 0
arr[16] = 0
arr[17] = 0
arr[18] = 0
arr[19] = 0
arr[20] = 0
arr[21] = 0
arr[22] = 0
arr[23] = 0
arr[24] = 0
arr[25] = 0
arr[26] = 0
arr[27] = 0
arr[28] = 0
arr[29] = 0
Program ended with exit code: 0
So obviously, as far as the C standard is concerned, this is undefined behaviour, and the compiler could make fly demons out of your nose and it would be fine-ish.
But you want to go deeper, as you ask for "under the hood", so we would essentially have to look for the assembler output. An excerpt (produced with gcc -g test test.c
and objdump -S --disassemble test
) is:
int main(int argc, const char * argv[]) {
743: 55 push %rbp
744: 48 89 e5 mov %rsp,%rbp
747: 48 83 ec 10 sub $0x10,%rsp
74b: 89 7d fc mov %edi,-0x4(%rbp)
74e: 48 89 75 f0 mov %rsi,-0x10(%rbp)
// insert code here...
printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
752: 8b 05 fc 08 20 00 mov 0x2008fc(%rip),%eax # 201054 <max_arr_count_index>
758: 89 c6 mov %eax,%esi
75a: 48 8d 3d 37 01 00 00 lea 0x137(%rip),%rdi # 898 <_IO_stdin_used+0x18>
761: b8 00 00 00 00 mov $0x0,%eax
766: e8 35 fe ff ff callq 5a0 <printf@plt>
printf("[before]The original array elements are :\n");
76b: 48 8d 3d 4e 01 00 00 lea 0x14e(%rip),%rdi # 8c0 <_IO_stdin_used+0x40>
772: e8 19 fe ff ff callq 590 <puts@plt>
print_all_arr(arr);
777: 48 8d 3d c2 08 20 00 lea 0x2008c2(%rip),%rdi # 201040 <arr>
77e: e8 6d ff ff ff callq 6f0 <print_all_arr>
arr[0] = 1;
783: c7 05 b3 08 20 00 01 movl $0x1,0x2008b3(%rip) # 201040 <arr>
78a: 00 00 00
arr[1] = 2;
78d: c7 05 ad 08 20 00 02 movl $0x2,0x2008ad(%rip) # 201044 <arr+0x4>
794: 00 00 00
arr[2] = 3;
797: c7 05 a7 08 20 00 03 movl $0x3,0x2008a7(%rip) # 201048 <arr+0x8>
79e: 00 00 00
arr[3] = 4;
7a1: c7 05 a1 08 20 00 04 movl $0x4,0x2008a1(%rip) # 20104c <arr+0xc>
7a8: 00 00 00
arr[4] = 5;
7ab: c7 05 9b 08 20 00 05 movl $0x5,0x20089b(%rip) # 201050 <arr+0x10>
7b2: 00 00 00
/* Point is this one.
If I assign arr[5] 30, then, max_arr_count_index is changed also as
30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
*/
arr[5] = 30;
7b5: c7 05 95 08 20 00 1e movl $0x1e,0x200895(%rip) # 201054 <max_arr_count_index>
7bc: 00 00 00
/* Point is this one.
If I assign arr[5] 30, then, max_arr_count_index is changed also as
30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
*/
printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
7bf: 8b 05 8f 08 20 00 mov 0x20088f(%rip),%eax # 201054 <max_arr_count_index>
7c5: 89 c6 mov %eax,%esi
7c7: 48 8d 3d 22 01 00 00 lea 0x122(%rip),%rdi # 8f0 <_IO_stdin_used+0x70>
7ce: b8 00 00 00 00 mov $0x0,%eax
7d3: e8 c8 fd ff ff callq 5a0 <printf@plt>
printf("[after]The array elements after insertion :\n");
7d8: 48 8d 3d 39 01 00 00 lea 0x139(%rip),%rdi # 918 <_IO_stdin_used+0x98>
7df: e8 ac fd ff ff callq 590 <puts@plt>
print_all_arr(arr);
7e4: 48 8d 3d 55 08 20 00 lea 0x200855(%rip),%rdi # 201040 <arr>
7eb: e8 00 ff ff ff callq 6f0 <print_all_arr>
return 0;
7f0: b8 00 00 00 00 mov $0x0,%eax
}
As you can see, even at that level, the disassembler already knows that you are effectively setting max_arr_count_index
. But why?
It is because the memory layout produced by GCC is simply that way (and we used -g
with gcc
to make it embed debug information so that the disassembler can know which memory location is which field). You have a global array of five ints, and a global int variable, declared right after each other. The global int variable is simply right behind the array in memory. Accessing the integer right behind the end of the array thus gives max_arr_count_index
.
Remember that access to an element i
of an array arr
of e.g. int
s is (at least on all architectures I know) simply accessing the memory location arr+sizeof(int)*i
, where arr
is the address of the first element.
As said, this is undefined behaviour. GCC could also order the global int variable before the array, which would lead to different effects, possibly even the program terminating when attempting to access arr[5]
if there is no valid memory page at that location.