c++image-processingcompiler-optimizationvectorization24-bit

Do modern c++ compilers autovectorize code for 24bit image processing?


Do compilers like gcc, visual studio c++, the intel c++ compiler, clang, etc. vectorize code like the following?

std::vector<unsigned char> img( height * width * 3 );
unsigned char channelMultiplier[3];

// ... initialize img and channelMultiplier ...

for ( int y = 0; y < height; ++y )
    for ( int x = 0; x < width; ++x )
        for ( b = 0; b < 3; ++b )
            img[ b+3*(x+width*y) ] = img[ b+3*(x+width*y) ] * 
                                     channelMultiplier[b] / 0x100;

How about the same for 32 bit image processing?


Solution

  • I do not think your tripple loop will auto-vectorize. IMO the problems are:

    in MSVC you can do this with __declspec(align(32)) double array[size] but you have to check with the specific compiler you are using to make sure you are using the correct alignment directives.

    Another important thing, if you use GNU compiler use the flag -ftree-vectorizer-verbose=6 to check whether your loop is being auto-vectorized. If you use the Intel compiler then use -vec-report5. Note that there are several levels of verbosity and information output i.e. the 6 and 5 numbers so checkout the compiler documentation. The higher the verbosity level the more vectorization information you will get for every loop in your code but the slower the compiler will compile in Release mode.

    In general, I have been always surprised how NOT easy is to get the compiler to auto-vectorize, it is a common mistake to assume that because a loop looks canonical then the compiler will auto-vectorize it auto-magically.

    UPDATE: and one more thing, make sure your img is actually page-aligned posix_memalign((void**) &buffer, sysconf(_SC_PAGESIZE), size*sizeof(double)); (which implies AVX and SSE aligned). The problem is that if you have a big image, this loop will most likely end-up page-switching during execution and that's also very expensive. I think this is what is so-called TLB misses.