linuxperformanceschedulerperfcontext-switch

How does "perf sched record" count context switches?


I used perf sched record -p 8827 to record the scheduler activity for a particular program. But when I generate the latency report it shows there only being one context switch, while asking it to dump the raw events shows many switches (excerpt below). How can this possibly makes sense? I assume I'm misinterpreting things somehow? Do switches to kworker not count?

perf sched latency output:

-----------------------------------------------------------------------------------------------------------------
Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
-----------------------------------------------------------------------------------------------------------------
:4320:4320            |      0.000 ms |        1 | avg:    0.000 ms | max:    0.000 ms | max at:      0.000000 s
-----------------------------------------------------------------------------------------------------------------
TOTAL:                |      0.000 ms |        1 |
---------------------------------------------------

perf sched script output:

test_program  8827 [008] 1727201.302534:       sched:sched_switch: test_program:8827 [100] R ==> kworker/8:0:4320 [120]
test_program  8827 [008] 1727201.303524: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=954318 [ns] vruntime=30559299144
test_program  8827 [008] 1727202.303521: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=999997430 [ns] vruntime=30559414
test_program  8827 [008] 1727202.303528: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=6419 [ns] vruntime=3055941451126
test_program  8827 [008] 1727202.303529:       sched:sched_wakeup: kworker/8:0:4320 [120] success=1 CPU:008
test_program  8827 [008] 1727202.303532:       sched:sched_switch: test_program:8827 [100] R ==> kworker/8:0:4320 [120]
test_program  8827 [008] 1727202.303738: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=192062 [ns] vruntime=30559414533
test_program  8827 [008] 1727202.303739: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=908 [ns] vruntime=3055941453351
test_program  8827 [008] 1727202.304471: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=731656 [ns] vruntime=30559414617
test_program  8827 [008] 1727203.304469: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=999997968 [ns] vruntime=30559529
test_program  8827 [008] 1727203.304531: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=62107 [ns] vruntime=305595299908
test_program  8827 [008] 1727203.304532: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=1006 [ns] vruntime=3055952999092
test_program  8827 [008] 1727204.302373:       sched:sched_switch: test_program:8827 [100] R ==> kworker/8:0:4320 [120]
test_program  8827 [008] 1727204.303365: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=979718 [ns] vruntime=30559645220
test_program  8827 [008] 1727205.303363: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=999997445 [ns] vruntime=30559760
test_program  8827 [008] 1727205.303368: sched:sched_stat_runtime: comm=test_program pid=8827 runtime=5898 [ns] vruntime=3055976058680
test_program  8827 [008] 1727205.303370:       sched:sched_wakeup: kworker/8:0:4320 [120] success=1 CPU:008
test_program  8827 [008] 1727205.303372:       sched:sched_switch: test_program:8827 [100] R ==> kworker/8:0:4320 [120]

Solution

  • The perf sched latency 'Switches' column just contains the involuntary/nonvoluntary context switches. Whereas perf sched script lists all context switches (i.e. voluntary and involuntary ones).

    You can cross-check this number with e.g. GNU time. For example:

    perf sched record /usr/bin/time -v yourtestprogram
    

    Or monitor the number of context-switches (per kind) in /proc/$pid/status by other means.

    See also how perf sched latency counts (i.e. nb_atoms is the 'Switches' value, e.g. trace back from that function in that source file).