I'm trying t understand a problem we cleared recently when using Clang 5.0 and Undefined Behavior Sanitizer (UBsan). We have code that processes a buffer in the forward or backwards direction. The reduced case is similar to the code shown below.
The 0-len
may look a little unusual, but it is needed for early Microsoft .Net compilers. Clang 5.0 and UBsan produced integer overflow findings:
adv-simd.h:1138:26: runtime error: addition of unsigned offset to 0x000003f78cf0 overflowed to 0x000003f78ce0
adv-simd.h:1140:26: runtime error: addition of unsigned offset to 0x000003f78ce0 overflowed to 0x000003f78cd0
adv-simd.h:1142:26: runtime error: addition of unsigned offset to 0x000003f78cd0 overflowed to 0x000003f78cc0
...
Lines 1138, 1140, 1142 (and friends) are the increment, which may
stride backwards due to the 0-len
.
ptr += inc;
According to Pointer comparisons in C. Are they signed or unsigned? (which also discusses C++), pointers are neither signed nor unsigned. Our offsets were unsigned and we relied on unsigned integer wrap to achieve the reverse stride.
The code was fine under GCC UBsan and Clang 4 and earlier UBsan. We eventually cleared it for Clang 5.0 with help with the LLVM devs. Instead of size_t
we needed to use ptrdiff_t
.
My question is, where was the integer overflow/undefined behavior in the construction? How did ptr + <unsigned>
result in signed integer overflow and lead to undefined behavior?
Here is an MSVC that mirrors the real code.
#include <cstddef>
#include <cstdint>
using namespace std;
uint8_t buffer[64];
int main(int argc, char* argv[])
{
uint8_t * ptr = buffer;
size_t len = sizeof(buffer);
size_t inc = 16;
// This sets up processing the buffer in reverse.
// A flag controls it in the real code.
if (argc%2 == 1)
{
ptr += len - inc;
inc = 0-inc;
}
while (len > 16)
{
// process blocks
ptr += inc;
len -= 16;
}
return 0;
}
The definition of adding an integer to a pointer is (N4659 expr.add/4):
I used an image here in order to preserve the formatting (this will be discussed below).
Note that this is a new wording which replaces a less-clear description from previous standards.
In your code (when argc
is odd) we end up with code equivalent to:
uint8_t buffer[64];
uint8_t *ptr = buffer + 48;
ptr = ptr + (SIZE_MAX - 15);
For the variables in the standard quote applied to your code, i
is 48
and j
is (SIZE_MAX - 15)
and n
is 64
.
The question now is whether it is true that 0 ≤ i + j ≤ n. If we interpret "i + j" to mean the result of the expression i + j
, then that equals 32
which is less than n
. But if it means the mathematical result then it is much greater than n
.
The standard uses a font for mathematical equations here and does not use the font for source code. ≤
is not a valid operator either. So I think they intend this equation to describe the mathematical value i.e. this is undefined behaviour.