I pass in a vector that is sorted in descending order and manually sum until a certain integer is reached. In the first function below, I let the for loop continue until the end, but in the second function I break out of the for loop. Weirdly enough, breaking out of the for loop is three times slower. I expected it to be quicker, as it doesn't have to finish the for loop.
Any ideas?
using BenchmarkTools
function manual_sum_whole_vector(in_vec, i)
n = 0
for x in in_vec
if x >= i
n += 1
end
end
return n
end
function manual_sum_break(in_vec, i)
n = 0
for x in in_vec
if x >= i
n += 1
else
break
end
end
return n
end
in_vec = collect(100000:-1:1)
@btime manual_sum_whole_vector(in_vec, 12345)
@btime manual_sum_break(in_vec, 12345)
6.575 μs (1 allocation: 16 bytes)
19.625 μs (1 allocation: 16 bytes)
My surmise is that the code without a break is getting vectorized by the compiler, while the code with a break cannot be vectorized and must remain as a scalar loop. Because the loop exit happens almost 90% of the way through the array, the inefficiency of iterating through the last 10% of the array is small compared to the gains from vectorization.