cpu-usagemulticorehyperthreading

CPU time on multicored/hyperthreaded


I need to observe the CPU time took by a process in a multicored/hyper-threaded. Suppose a Xeon, Opteron, etc.

Let's assume I have 4 cores, hyper threaded, meaning 8 'virtual' cores. Let X the program I want to run an observed how much CPU time it took.

QUESTIONs:

  1. What's the relationship between numbers A, Bi, Ci, Di?

  2. Is A smaller than Bi? How much? What about Ci, Di?

  3. Are times Bi different between them? What about Ci, Di?


Solution

  • What's the relationship between numbers A, Bi, Ci, Di?

    Expect D1=D2=D3=D4=A*1, except if you have L2 cache issues (conflicts, faults, ...) where you will have a slightly greater number instead of 1.

    Expect B1=B2=B3=B4=...=B8=A*1.3. The number 1.3 may vary between 1.1 and 2 depending on you application (certain processor subparts are hyperthreaded, others are not). It was computed from similar statistics, with I give here using the notations of the question: D=23 seconds, and A=18 seconds, according to a private forum. The unthreaded process did integer computations without input/output. Exact application was checking Adem coefficients in algebra of motivic Steenrod (don't know what it is; settings were (2n+e,n) with n=20).

    In the case of sevent processes (Cs), if you assign each process to a core (with /usr/bin/htop on linux), then you will have one of the process (C5 for example) that has the same execution time as an A, and the others (in my example, C1, C2, C3, C4, C6, C7) would have same values than Ds. If you do not assign the processes to cores, and your process lasts enough for the OS do balance them between the cores, they will converge to the mean of the C.

    Are times Bi different between them? What about Ci, Di?
    

    Depend on your OS scheduler and on its configuration. And the percentage shown by /bin/top from linux is cheating, it will show nearly 100% for A, Bs, Cs and Ds.

    To assess performances, don't forget /usr/bin/nettop (and variants nethogs, nmon, iftop, iptraf), iotop (and variants iostat, latencytop), and collectl (+colmux) and sar (+sag, +sadf).