carrayspointersboundary

In C, why can I see a value written past the end of an array in a different variable?


I've spent my spare time doing fun things for my personal joy in C these days....

But, I ended up finding out something interesting to me. I do not know why this result is happening until now..

max_arr_count_index is assigned depending on arr[5] value, which is past the end of the array +1.

Is there someone who can explain this to me? I know it should not be. I assigned the value the past one index of the array (here, arr[5] = 30 in the problem case) and it's not safe, and it is undefined behavior as defined by the standard.

I am not gonna do the same thing in the real field, But, I just want to get more under the hood here.

Clang and GCC have given me the same result.

Code and result is below:

[No Problem case: I do not assign the value past end of the index]

#include <stdio.h>

int arr[] = {11,33,55,77,88};
int max_arr_count_index = (sizeof(arr) / sizeof(arr[0]));

// print all
void print_all_arr(int* arr)
{
    // just print all arr datas regarding index.
    for(int i = 0; i < max_arr_count_index; i++) {
        printf("arr[%d] = %d \n", i, arr[i]);
    }
}

int main(int argc, const char * argv[]) {
    // insert code here...
    printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
    printf("[before]The original array elements are :\n");
    print_all_arr(arr);
    arr[0] = 1;
    arr[1] = 2;
    arr[2] = 3;
    arr[3] = 4;
    arr[4] = 5;
    // arr[5] = 1000;
    printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
    printf("[after]The array elements after :\n");

    print_all_arr(arr);

    return 0;
}

No problem result is below:

[before]max_arr_count_index : 5
[before]The original array elements are :
arr[0] = 11 
arr[1] = 33 
arr[2] = 55 
arr[3] = 77 
arr[4] = 88 
[after]max_arr_count_index : 5
[after]The array elements after :
arr[0] = 1 
arr[1] = 2 
arr[2] = 3 
arr[3] = 4 
arr[4] = 5 
Program ended with exit code: 0

[Problem case: I assigned the value past end of the index]

#include <stdio.h>

int arr[] = {11,33,55,77,88};
int max_arr_count_index = (sizeof(arr) / sizeof(arr[0]));

// print all
void print_all_arr(int* arr)
{
    // just print all arr datas regarding index.
    for(int i = 0; i < max_arr_count_index; i++) {
        printf("arr[%d] = %d \n", i, arr[i]);
    }
}

int main(int argc, const char * argv[]) {
    // insert code here...
    printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
    printf("[before]The original array elements are :\n");
    print_all_arr(arr);
    arr[0] = 1;
    arr[1] = 2;
    arr[2] = 3;
    arr[3] = 4;
    arr[4] = 5;

    /* Point is this one. 
       If I assign arr[5] 30, then, max_arr_count_index is changed also as            
       30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
    */

    arr[5] = 30;

    /* Point is this one. 
       If I assign arr[5] 30, then, max_arr_count_index is changed also as            
       30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
    */

    printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
    printf("[after]The array elements after arr[5] is assigned 30 :\n");

    print_all_arr(arr);

    return 0;
}

Result is below :

[before]max_arr_count_index : 5
[before]The original array elements are :
arr[0] = 11 
arr[1] = 33 
arr[2] = 55 
arr[3] = 77 
arr[4] = 88 
[after]max_arr_count_index : 30
[after]The array elements after arr[5] is assigned 30 :
arr[0] = 1 
arr[1] = 2 
arr[2] = 3 
arr[3] = 4 
arr[4] = 5 
arr[5] = 30 
arr[6] = 0 
arr[7] = 0 
arr[8] = 0 
arr[9] = 0 
arr[10] = 0 
arr[11] = 0 
arr[12] = 0 
arr[13] = 0 
arr[14] = 0 
arr[15] = 0 
arr[16] = 0 
arr[17] = 0 
arr[18] = 0 
arr[19] = 0 
arr[20] = 0 
arr[21] = 0 
arr[22] = 0 
arr[23] = 0 
arr[24] = 0 
arr[25] = 0 
arr[26] = 0 
arr[27] = 0 
arr[28] = 0 
arr[29] = 0 
Program ended with exit code: 0

Solution

  • So obviously, as far as the C standard is concerned, this is undefined behaviour, and the compiler could make fly demons out of your nose and it would be fine-ish.

    But you want to go deeper, as you ask for "under the hood", so we would essentially have to look for the assembler output. An excerpt (produced with gcc -g test test.c and objdump -S --disassemble test) is:

    int main(int argc, const char * argv[]) {
     743:   55                      push   %rbp
     744:   48 89 e5                mov    %rsp,%rbp
     747:   48 83 ec 10             sub    $0x10,%rsp
     74b:   89 7d fc                mov    %edi,-0x4(%rbp)
     74e:   48 89 75 f0             mov    %rsi,-0x10(%rbp)
        // insert code here...
        printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
     752:   8b 05 fc 08 20 00       mov    0x2008fc(%rip),%eax        # 201054 <max_arr_count_index>
     758:   89 c6                   mov    %eax,%esi
     75a:   48 8d 3d 37 01 00 00    lea    0x137(%rip),%rdi        # 898 <_IO_stdin_used+0x18>
     761:   b8 00 00 00 00          mov    $0x0,%eax
     766:   e8 35 fe ff ff          callq  5a0 <printf@plt>
        printf("[before]The original array elements are :\n");
     76b:   48 8d 3d 4e 01 00 00    lea    0x14e(%rip),%rdi        # 8c0 <_IO_stdin_used+0x40>
     772:   e8 19 fe ff ff          callq  590 <puts@plt>
        print_all_arr(arr);
     777:   48 8d 3d c2 08 20 00    lea    0x2008c2(%rip),%rdi        # 201040 <arr>
     77e:   e8 6d ff ff ff          callq  6f0 <print_all_arr>
        arr[0] = 1;
     783:   c7 05 b3 08 20 00 01    movl   $0x1,0x2008b3(%rip)        # 201040 <arr>
     78a:   00 00 00 
        arr[1] = 2;
     78d:   c7 05 ad 08 20 00 02    movl   $0x2,0x2008ad(%rip)        # 201044 <arr+0x4>
     794:   00 00 00 
        arr[2] = 3;
     797:   c7 05 a7 08 20 00 03    movl   $0x3,0x2008a7(%rip)        # 201048 <arr+0x8>
     79e:   00 00 00 
        arr[3] = 4;
     7a1:   c7 05 a1 08 20 00 04    movl   $0x4,0x2008a1(%rip)        # 20104c <arr+0xc>
     7a8:   00 00 00 
        arr[4] = 5;
     7ab:   c7 05 9b 08 20 00 05    movl   $0x5,0x20089b(%rip)        # 201050 <arr+0x10>
     7b2:   00 00 00 
        /* Point is this one. 
           If I assign arr[5] 30, then, max_arr_count_index is changed also as            
           30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
        */
    
        arr[5] = 30;
     7b5:   c7 05 95 08 20 00 1e    movl   $0x1e,0x200895(%rip)        # 201054 <max_arr_count_index>
     7bc:   00 00 00 
        /* Point is this one. 
           If I assign arr[5] 30, then, max_arr_count_index is changed also as            
           30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
        */
    
        printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
     7bf:   8b 05 8f 08 20 00       mov    0x20088f(%rip),%eax        # 201054 <max_arr_count_index>
     7c5:   89 c6                   mov    %eax,%esi
     7c7:   48 8d 3d 22 01 00 00    lea    0x122(%rip),%rdi        # 8f0 <_IO_stdin_used+0x70>
     7ce:   b8 00 00 00 00          mov    $0x0,%eax
     7d3:   e8 c8 fd ff ff          callq  5a0 <printf@plt>
        printf("[after]The array elements after insertion :\n");
     7d8:   48 8d 3d 39 01 00 00    lea    0x139(%rip),%rdi        # 918 <_IO_stdin_used+0x98>
     7df:   e8 ac fd ff ff          callq  590 <puts@plt>
    
        print_all_arr(arr);
     7e4:   48 8d 3d 55 08 20 00    lea    0x200855(%rip),%rdi        # 201040 <arr>
     7eb:   e8 00 ff ff ff          callq  6f0 <print_all_arr>
    
        return 0;
     7f0:   b8 00 00 00 00          mov    $0x0,%eax
    }
    

    As you can see, even at that level, the disassembler already knows that you are effectively setting max_arr_count_index. But why?

    It is because the memory layout produced by GCC is simply that way (and we used -g with gcc to make it embed debug information so that the disassembler can know which memory location is which field). You have a global array of five ints, and a global int variable, declared right after each other. The global int variable is simply right behind the array in memory. Accessing the integer right behind the end of the array thus gives max_arr_count_index.

    Remember that access to an element i of an array arr of e.g. ints is (at least on all architectures I know) simply accessing the memory location arr+sizeof(int)*i, where arr is the address of the first element.

    As said, this is undefined behaviour. GCC could also order the global int variable before the array, which would lead to different effects, possibly even the program terminating when attempting to access arr[5] if there is no valid memory page at that location.