x86floating-pointcpu-architecturesimdflops

How many float multiplies can be performed with a single core of the current Intel architectures?


Trying to assess the performance gain from an embedded architecture I tried to search for the number of floating point multiplies that can be performed in a cycle on a single core of the Core 2 and Core i7 architectures, but could not find a quick answer to that. Unfortunately I am not familiar with the ISA so I cannot tell that from looking at the respective instructions. I assume it would be some kind of a SIMD instruction. Any idea?


Solution

  • One thing: Core 2 is not Intel's latest architecture. That would be Sandy Bridge.

    Core 2 and Core i7 Nehalem, can sustain 1 SSE multiply/cycle. Each SSE instruction can handle up to 4 single-precision or 2 double-precision. So that's 2 DP or 4 SP floating-point multiplies per cycle.

    Core i7 Sandy Bridge can sustain 1 AVX multiply/cycle. AVX is double the size of SSE. So that's 4 DP or 8 SP floating-point multiplies per cycle.