I used the following perf
command to sample userspace read accesses to DRAM by evince
:
perf record -d --call-graph dwarf -c 100 -e mem_load_uops_retired.l3_miss:uppp /opt/evince-3.28.4/bin/evince
As can be seen, I used the PEBS
feature to increase the accuracy of sampling. But there are some non-memory accesses reported as memory ones. For example, this is a sampled event reported by perf script
:
evince 20589 16079.401401: 100 mem_load_uops_retired.l3_miss:uppp: 555555860750 5080022 N/A|SNP N/A|TLB N/A|LCK N/A
555555579939 ev_history_can_go_back+0x19 (/opt/evince-3.28.4/bin/evince)
5555555862ef ev_window_update_actions_sensitivity+0xa1f (/opt/evince-3.28.4/bin/evince)
55555558ce4f ev_window_page_changed_cb+0xf (/opt/evince-3.28.4/bin/evince)
7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff575805d signal_emit_unlocked_R+0xf4d (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff5760714 g_signal_emit_valist+0xa74 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff7140d76 emit_value_changed+0xf6 (inlined)
7ffff7140d76 adjustment_set_value+0xf6 (inlined)
7ffff7140d76 gtk_adjustment_set_value_internal+0xf6 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff5757de7 signal_emit_unlocked_R+0xcd7 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff575fc7f g_signal_emitv+0x27f (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff7153519 gtk_binding_entry_activate+0x289 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff71539ef binding_activate+0x5f (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7153b7f gtk_bindings_activate_list+0x17f (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7154cd8 gtk_bindings_activate_event+0x98 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7973959 ev_view_key_press_event+0x59 (/opt/evince-3.28.4/lib/libevview3.so.3.0.0)
7ffff72698f6 _gtk_marshal_BOOLEAN__BOXEDv+0xa6 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff574524f _g_closure_invoke_va+0xbf (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff57603cc g_signal_emit_valist+0x72c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff73b1533 gtk_widget_event_internal+0x163 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff73d1f0a gtk_window_propagate_key_event+0xfa (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
5555555894b1 ev_window_key_press_event+0x31 (/opt/evince-3.28.4/bin/evince)
7ffff72698f6 _gtk_marshal_BOOLEAN__BOXEDv+0xa6 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff5745345 _g_closure_invoke_va+0x1b5 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff57603cc g_signal_emit_valist+0x72c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff73b1533 gtk_widget_event_internal+0x163 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff726693e propagate_event+0x21e (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7268947 gtk_main_do_event+0x7f7 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff6d79764 _gdk_event_emit+0x24 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff6da9f91 gdk_event_source_dispatch+0x21 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff546a416 g_main_dispatch+0x2e6 (inlined)
7ffff546a416 g_main_context_dispatch+0x2e6 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff546a64f g_main_context_iterate+0x1ff (inlined)
7ffff546a6db g_main_context_iteration+0x2b (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff5a2be3c g_application_run+0x1fc (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.5600.4)
555555573707 main+0x447 (/opt/evince-3.28.4/bin/evince)
7ffff4a91b96 __libc_start_main+0xe6 (/lib/x86_64-linux-gnu/libc-2.27.so)
555555573899 _start+0x29 (/opt/evince-3.28.4/bin/evince)
ffffffffffffffff [unknown] ([unknown])
This implies that there exists an access to address 0x555555860750
(which is located in [heap]
) by an instruction at 0x555555579939
(which is located in evince
text
section at offset 0x19
of function ev_history_can_go_back()
). This memory instrucion is the last line in the following code snippet:
0000000000025920 <ev_history_can_go_back>:
25920: 53 push %rbx
25921: 48 89 fb mov %rdi,%rbx
25924: e8 67 fa ff ff callq 25390 <ev_history_get_type>
25929: 48 85 db test %rbx,%rbx
2592c: 74 42 je 25970 <ev_history_can_go_back+0x50>
2592e: 48 8b 13 mov (%rbx),%rdx
25931: 48 85 d2 test %rdx,%rdx
25934: 74 05 je 2593b <ev_history_can_go_back+0x1b>
25936: 48 39 02 cmp %rax,(%rdx)
25939: 74 0f je 2594a <ev_history_can_go_back+0x2a>
This is a jump to ev_history_can_go_back+0x2a
and, apparently, this is not an access to [heap]
at address 0x555555860750
. Is this perf
report wrong?
UPDATE
How about the following backtrace?
11159097179866 0xfb80 [0x1778]: PERF_RECORD_SAMPLE(IP, 0x4002): 7309/7309: 0x7ffff6d6c310 period: 10000 addr: 0x7ffff7034e50
... FP chain: nr:0
... user regs: mask 0xff0fff ABI 64-bit
.... AX 0x555555b8b4c0
.... BX 0x555555c48e10
.... CX 0x1
.... DX 0x7fffffffd988
.... SI 0x7fffffffd980
.... DI 0x555555b8b4c0
.... BP 0x258
.... SP 0x7fffffffd978
.... IP 0x7ffff6d6c310
.... FLAGS 0x20e
.... CS 0x33
.... SS 0x2b
.... R8 0x27c
.... R9 0x24
.... R10 0x2a2
.... R11 0x0
.... R12 0x258
.... R13 0x555555b8b4c0
.... R14 0x3000
.... R15 0x7ffff5747000
... ustack: size 5768, offset 0xd8
. data_src: 0x5080022
... thread: evince:7309
...... dso: /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30
evince 7309 11159.097179: 10000 mem_load_uops_retired.l3_miss:uppp: 7ffff7034e50 5080022 N/A|SNP N/A|TLB N/A|LCK N/A
7ffff6d6c310 cairo_surface_get_device_scale@plt+0x0 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff6d91029 gdk_window_create_similar_surface+0xc9 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff6d95410 gdk_window_begin_paint_internal+0x350 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff6d956f1 gdk_window_begin_draw_frame+0xc1 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff73c4942 gtk_widget_render+0xd2 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff7268858 gtk_main_do_event+0x708 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
7ffff6d79764 _gdk_event_emit+0x24 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff6d897f4 _gdk_window_process_updates_recurse_helper+0x104 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff6d8a9f5 gdk_window_process_updates_internal+0x165 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff6d8abef gdk_window_process_updates_with_mode+0x11f (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff575805d signal_emit_unlocked_R+0xf4d (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff5760714 g_signal_emit_valist+0xa74 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
7ffff6d82ac8 gdk_frame_clock_paint_idle+0x3c8 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff6d6e07f gdk_threads_dispatch+0x1f (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
7ffff546ad02 g_timeout_dispatch+0x12 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff546a284 g_main_dispatch+0x154 (inlined)
7ffff546a284 g_main_context_dispatch+0x154 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff546a64f g_main_context_iterate+0x1ff (inlined)
7ffff546a6db g_main_context_iteration+0x2b (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
7ffff5a2be3c g_application_run+0x1fc (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.5600.4)
555555573707 main+0x447 (/opt/evince-3.28.4/bin/evince)
7ffff4a91b96 __libc_start_main+0xe6 (/lib/x86_64-linux-gnu/libc-2.27.so)
555555573899 _start+0x29 (/opt/evince-3.28.4/bin/evince)
The access point is at offset 0
of the following disassembly:
Dump of assembler code for function cairo_surface_get_device_scale@plt:
0x000000000002a310 <+0>: jmpq *0x2c8b3a(%rip) # 0x2f2e50
0x000000000002a316 <+6>: pushq $0x1c7
0x000000000002a31b <+11>: jmpq 0x28690
This is an unconditional jump which will not lead to macrofusion.
On Intel CPUs at least, cmp %rax,(%rdx)
can macro-fuse with the following je
, while also micro-fusing the load. https://agner.org/optimize/. Also related: Micro fusion and addressing modes (this is a non-indexed addressing mode so this can stay micro-fused even on Sandybridge/IvyBridge).
So in the fused domain (where retirement happens) you really do have single-uop compare-and-branch with a memory source. Note that mem_load_uops_retired.l3_miss:uppp
counts uops, not instructions.
Even in the unfused domain, macro-fused compare/branch really executes as a single uop on a single execution unit, but the load does have to execute on a separate port. (Micro-fusion saves decode/issue front-end bandwidth, and uop cache space, but not back-end ports.)