hpcsupercomputers

Application performance vs Peak performance


I have questions about real application performance running on a cluster vs cluster peak performance.

Let's say one HPC cluster report that it has peak performance of 1 Petaflops. How is this calculated? To me, it seems that there are two measuring matrixes. One is the performance calculated based on the hardware. The other one is from running HPL? Is my understanding correct? When I am reading one real application running on the system at full scale, the developer mentions that it could achieve 10% of the peak performance. How is this measured and why it can't achieve peak performance?

Thanks


Solution

  • Peak performance is what the system is theoretically able to deliver. It is the product of the total number of CPU cores, the core clock frequency, and the number of FLOPs one core makes per clock tick. That performance can never be reached in practice because no real application consists of 100% fully vectorised tight loops that only operate on data held in the L1 data cache. In many cases data doesn't even fit in the last-level cache and the memory interface is usually not fast enough to deliver data at the same rate at which the CPU is able to process it. One ubiquitous example from HPC is the multiplication of a sparse matrix with a vector. It is so memory intensive (i.e. many loads and stores per arithmetic operation) that on many platforms it only achieves a fraction of the peak performance.

    Things get even worse when multiple nodes are networked together on a massive scale as data transfers could introduce huge additional delays. Performance in those cases is determined mainly by the ratio of local data processing and data transfer. HPL is a particularly good in that aspect - it does a lot of vectorised local processing and does not move much data across the CPUs/nodes. That's not the case with many real-world parallel programs and also the reason why many are questioning the applicability of HPL in assessing cluster performance nowadays. Alternative benchmarks are already emerging, for example the HPCG benchmark (from the people who brought you HPL).