In the documentation for CUDA 6.5 has written: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz3PIXMTktb
5.2.3. Multiprocessor Level
...
- 8L for devices of compute capability 3.x since a multiprocessor issues a pair of instructions per warp over one clock cycle for four warps at a time, as mentioned in Compute Capability 3.x.
Does this mean that the GPU Kepler CC3.0 processors are not only pipelined architecture, but also superscalar?
Pipelining - these two sequences execute in parallel (different operations at one time):
Superscalar - these two sequences execute in parallel (the same operations at one time):
Yes, the warp schedulers in Kepler can schedule two instructions per clock, as long as:
If that fits your definition of superscalar, then it is superscalar.
With respect to pipelining, I view pipelining differently. Various execution units in Kepler SM are pipelined. Let's take a floating point multiply as an example.
In a given clock, a Kepler warp scheduler may schedule a floating point multiply operation on a floating-point unit. The results of this operation may not appear for some number of clocks later, (i.e. they are not available on the next clock cycle) but on the next clock cycle, a new floating point operation can be scheduled on the very same floating point functional units, because the hardware (floating point units, in this case) is pipelined.
clock operation pipeline stage result
0 MPY1 -> PS1
1 PS2
... ...
N-1 PSN -> result1
on the very next clock after clock 0, a new multiply instruction can be scheduled on the same HW, and the corresponding result will appear on the next cycle after result1
appears.
Not sure if this is what you meant by "different operations at one time"