I am doing some image processing, for which I benefit from vectorization.
I have a function that vectorizes ok, but for which I am not able to convince the compiler that the input and output buffer have no overlap, and so no alias checking is necessary.
I should be able to do so using __restrict__
, but if the buffers are not defined as __restrict__
when arriving as function argument, there is no way to convince the compiler that I am absolutely sure that 2 buffers will never overlap.
This is the function:
__attribute__((optimize("tree-vectorize","tree-vectorizer-verbose=6")))
void threshold(const cv::Mat& inputRoi, cv::Mat& outputRoi, const unsigned char th) {
const int height = inputRoi.rows;
const int width = inputRoi.cols;
for (int j = 0; j < height; j++) {
const uint8_t* __restrict in = (const uint8_t* __restrict) inputRoi.ptr(j);
uint8_t* __restrict out = (uint8_t* __restrict) outputRoi.ptr(j);
for (int i = 0; i < width; i++) {
out[i] = (in[i] < valueTh) ? 255 : 0;
}
}
}
The only way I can convince the compiler to not perform the alias checking is if I put the inner loop in a separate function, in which the pointers are defined as __restrict__
arguments. If I declare this inner function as inlined, again the alias checking is activated.
You can see the effect also with this example, which I think is consistent.
(Note: I know there might be better ways of writing the same function, but in this case I am just trying to understand how to avoid alias check)
Edit:
Problem is solved!! (See answer below)
Using gcc 4.9.2, here is the complete example. Note the use of the compiler flag -fopt-info-vec-optimized
in place of the superseded -ftree-vectorizer-verbose=N
.
So, for gcc, use #pragma GCC ivdep
and enjoy! :)
if you are using Intel compiler, you can try to include the line:
#pragma ivdep
The following paragraph is quoted from Intel compiler user manual:
The ivdep pragma instructs the compiler to ignore assumed vector dependencies. To ensure correct code, the compiler treats an assumed dependence as a proven dependence, which prevents vectorization. This pragma overrides that decision. Use this pragma only when you know that the assumed loop dependencies are safe to ignore.
In gcc, one should add the line:
#pragma GCC ivdep
inside the function and right before the loop you want to vectorize (see documentation). This is only supported starting from gcc 4.9 and, by the way, makes the use of __restrict__
redundant.