c++multidimensional-arrayc++17clangaddress-sanitizer

Why didn't std::copy to overflow the buffer; only got clang sanitizer warning "Undefined Behavior"?


I was concerned about a std::copy line in a code review. I ran clang sanitizers on it and it confirmed the smell. I went into the debugger and saw that the copy did not overflow into the next field in the structure as I expected, and I do not understand why that is the case.

In below program, in the std::copy, the 8 was intended to represent the 8 elements (of type long); but, the type of src.arr[0] is pointer to long[4]. So, I expected the 8 to try to copy 8 long[4] items thereby overflowing the array following arr; i.e., the overflowArray field. But in my test, it did not, and in the reviewed program it also worked without overflowing.

In below program is following line that clang reports as Undefined Behavior but currently gives desired results. One important question: Can this code on a different platform and/or compiler result in incorrect results.?

std::copy(src.arr[0], src.arr[0] + 8, dst.arr[0]);

#include <cstdlib>
#include <iostream>
#include <numeric>
#include <cstring>

struct Astruct
{
    long arr[2][4];
    long overflowArray[8];
};

void printAstruct(Astruct str)
{
    std::cout << "arr: ";
    for( size_t i = 0; i < 2; ++i)
    {
        for( size_t j = 0; j < 4; ++j)
        {
            std::cout << str.arr[i][j] << " ";
        }
    }
    std::cout << ";  overflowArray: ";
    for( size_t k = 0; k < 8; ++k)
    {
        std::cout << str.overflowArray[k] << " ";
    }
    std::cout << std::endl;
}

void fillArray(Astruct& );

int main()
{
    Astruct src{};
    Astruct dst{};

    std::cout << "src size: " << sizeof(src.arr) << "; dst size:" << sizeof(src.overflowArray) << std::endl;
    // SETUP
    std::cout << "dst before copy" << std::endl;
    printAstruct(dst);
    long* srcStart = &src.arr[0][0];
    memset(srcStart, 0xFF, sizeof(src.arr));

    std::cout << "src:" << std::endl;
    printAstruct(src);
    
    // the Smell: src.arr[0] is type: array of 4 longs:
    std::copy(src.arr[0], src.arr[0] + 8, dst.arr[0]); // <--  Undefined Behavior

    std::cout << "\ndst after copy:" << std::endl;
    printAstruct(dst);
}

Output:

src size: 32; dst size:32
dst before copy
arr: 0 0 0 0 0 0 0 0 ;  overflowArray: 0 0 0 0 0 0 0 0 
src:
arr: -1 -1 -1 -1 -1 -1 -1 -1 ;  overflowArray: 0 0 0 0 0 0 0 0 

dst after copy:
arr: -1 -1 -1 -1 -1 -1 -1 -1 ;  overflowArray: 0 0 0 0 0 0 0 0  <-- I expected these 0's to be overwritten

It appears that I was correct in seeing a smell as confirmed by clang sanitizer, but I do not understand why the overflow did not occur, since 8 should have advanced the end address past the 8 longs.

Ran clang with sanitizers and confirmed the smell:

runtime error: index 8 out of bounds for type 'long[4]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior

Why wasn't overflowArray overwritten? Could it be overwritten in another platform and/or compiler?

I changed the code to:

std::copy(&src.arr[0][0], &src.arr[0][0] + 8, &dst.arr[0][0]); 

since &src.arr[0][0] is a pointer to long. But was told this was too confusing, and the addresses are the same in both the orignal or in my change.


Solution

  • Lets look at your struct.

    struct Astruct
    {
      long arr[2][4];
      long overflowArray[8];
    };
    

    it looks like this:

    arr[0][0] arr[0][1] arr[0][2] arr[0][3]
    arr[1][0] arr[1][1] arr[1][2] arr[1][3]
    ovrflw[0] ovrflw[1] ovrflw[2] ovrflw[3] ovrflw[4] ovrflw[5] ovrflw[6] ovrflw[7]
    

    there are 8 longs for arr[][] and then 8 longs for overflowArray (I made it shorter for spacing purposes).

    Now, what is going on here:

    // the Smell: src.arr[0] is type: array of 4 longs:
    std::copy(src.arr[0], src.arr[0] + 8, dst.arr[0]); // <--  Undefined Behavior
    

    so, src.arr[0] is of type long[4]. However, the first argument of std::copy is a value parameter. So it is being passed by value.

    When you pass a long[4] by value, it decays into long* pointing to the first element of the array.

    Similarly, when you do src.arr[0]+8 it says "ok, adding long[4] with an int. That makes no sense! Let's try decaying!" and it again converts the long[4] into a long* pointing at the first element of the array.

    Going back to the memory:

    \/ decay of arr[0] points here
    arr[0][0] arr[0][1] arr[0][2] arr[0][3]
    arr[1][0] arr[1][1] arr[1][2] arr[1][3]
    \/ arr[0]+8 points here
    ovrflw[0] ovrflw[1] ovrflw[2] ovrflw[3] ovrflw[4] ovrflw[5] ovrflw[6] ovrflw[7]
    

    Now, this is violating part of the C++ standard; in particular, we have a pointer into an array of length 4 and we are advancing 8 steps.

    In practice treading [2][4] arrays like [8] arrays works. If some wording in the C++ standard says this isn't allowed? Compilers ignore it. The use of [a][b][c] arrays (with compile time constant size) as variants of [a*b*c] arrays is far too common in C and C++ code bases to enforce any kind of UB here.

    If you accept that you can treat long arr[2][4] as a contiguous block of 8 longs, then arr[0] through arr[0]+8 (once suitably decayed) are indeed pointers to the start and one-past-the-end of this block of memory.