Is there any data on AVX2 gather latency?
(for instance a _mm256_i32gather_ps instruction accessing a single cache line)
This page gives latency data for all intrinsics:
Intel Intrinsics Guide
The latency for _mm256_i32gather_ps is 6.