I have two functions counting the occurrences of a target char in the given input buffer. The functions vary only in how they communicate the result back to the caller; one returns the result and the other writes to a variable passed by reference.
#include <cstdlib>
#define BUF_LEN 0x1000
size_t check_count1(const char* buf, char target) {
size_t count = 0;
for (size_t i = 0; i < BUF_LEN; i++) {
if (buf[i] == target) {
count++;
}
}
return count;
}
void check_count2(const char* buf, char target, size_t& count) {
for (size_t i = 0; i < BUF_LEN; i++) {
if (buf[i] == target) {
count++;
}
}
}
I am puzzled by how Clang and GCC generate code for these two functions. The loop in check_count1 is vectorized, but for check_count2 it's not. Initially I thought this was due to pointer aliasing in the second case, but specifying __restrict has no effect. Here's the link to compiler explorer.
An older ICC compiler did just fine with both loops. What changed?
One reason is pointer aliasing, as NathanOliver points out in comment.
Another reason is that, in its current form, if buf[i] == target is false for every i, the modification of count needs to be skipped. This doesn't matter if count is a local variable, in which case an extra assignment to count is unobservable, but matters if count lives in rodata†, where modification is not allowed.
If the loop body is changed to always modify count, then GCC and Clang will vectorize. For example, this will also be vectorized:
void check_count2(const char* buf, char target, size_t& __restrict count) {
for (size_t i = 0; i < BUF_LEN; i++) {
if (buf[i] == target) {
count++;
} else {
count++;
count--;
}
}
}
As discussed in Crash with icc: can the compiler invent writes where none existed in the abstract machine?, ICC invents modifications even if it's not allowed to do so.
†: This can happen if count is initialized from const_cast<size_t&>(n), where n is a const global variable. Note that casting away constness is not undefined behavior: modifying const variable is.