performancex86latencymicro-optimizationavx2

Is there any data on the latency of an AVX2 gather instruction?


Is there any data on AVX2 gather latency?

(for instance a _mm256_i32gather_ps instruction accessing a single cache line)


Solution

  • This page gives latency data for all intrinsics:

    Intel Intrinsics Guide

    The latency for _mm256_i32gather_ps is 6.