linuxx86-64performancecounterperfmemory-access

Perf Reports Some Direct Jump Instructions as Memory Access Instructions


I used the following perf command to sample userspace read accesses to DRAM by evince:

perf record -d --call-graph dwarf -c 100 -e mem_load_uops_retired.l3_miss:uppp /opt/evince-3.28.4/bin/evince

As can be seen, I used the PEBS feature to increase the accuracy of sampling. But there are some non-memory accesses reported as memory ones. For example, this is a sampled event reported by perf script:

evince 20589 16079.401401:        100 mem_load_uops_retired.l3_miss:uppp:     555555860750         5080022 N/A|SNP N/A|TLB N/A|LCK N/A
    555555579939 ev_history_can_go_back+0x19 (/opt/evince-3.28.4/bin/evince)
    5555555862ef ev_window_update_actions_sensitivity+0xa1f (/opt/evince-3.28.4/bin/evince)
    55555558ce4f ev_window_page_changed_cb+0xf (/opt/evince-3.28.4/bin/evince)
    7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff575805d signal_emit_unlocked_R+0xf4d (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff5760714 g_signal_emit_valist+0xa74 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff7140d76 emit_value_changed+0xf6 (inlined)
    7ffff7140d76 adjustment_set_value+0xf6 (inlined)
    7ffff7140d76 gtk_adjustment_set_value_internal+0xf6 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff5757de7 signal_emit_unlocked_R+0xcd7 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff575fc7f g_signal_emitv+0x27f (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff7153519 gtk_binding_entry_activate+0x289 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff71539ef binding_activate+0x5f (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff7153b7f gtk_bindings_activate_list+0x17f (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff7154cd8 gtk_bindings_activate_event+0x98 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff7973959 ev_view_key_press_event+0x59 (/opt/evince-3.28.4/lib/libevview3.so.3.0.0)
    7ffff72698f6 _gtk_marshal_BOOLEAN__BOXEDv+0xa6 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff574524f _g_closure_invoke_va+0xbf (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff57603cc g_signal_emit_valist+0x72c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff73b1533 gtk_widget_event_internal+0x163 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff73d1f0a gtk_window_propagate_key_event+0xfa (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    5555555894b1 ev_window_key_press_event+0x31 (/opt/evince-3.28.4/bin/evince)
    7ffff72698f6 _gtk_marshal_BOOLEAN__BOXEDv+0xa6 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff5745345 _g_closure_invoke_va+0x1b5 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff57603cc g_signal_emit_valist+0x72c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
    7ffff73b1533 gtk_widget_event_internal+0x163 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff726693e propagate_event+0x21e (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff7268947 gtk_main_do_event+0x7f7 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
    7ffff6d79764 _gdk_event_emit+0x24 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
    7ffff6da9f91 gdk_event_source_dispatch+0x21 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
    7ffff546a416 g_main_dispatch+0x2e6 (inlined)
    7ffff546a416 g_main_context_dispatch+0x2e6 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
    7ffff546a64f g_main_context_iterate+0x1ff (inlined)
    7ffff546a6db g_main_context_iteration+0x2b (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
    7ffff5a2be3c g_application_run+0x1fc (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.5600.4)
    555555573707 main+0x447 (/opt/evince-3.28.4/bin/evince)
    7ffff4a91b96 __libc_start_main+0xe6 (/lib/x86_64-linux-gnu/libc-2.27.so)
    555555573899 _start+0x29 (/opt/evince-3.28.4/bin/evince)
ffffffffffffffff [unknown] ([unknown])

This implies that there exists an access to address 0x555555860750 (which is located in [heap]) by an instruction at 0x555555579939 (which is located in evince text section at offset 0x19 of function ev_history_can_go_back()). This memory instrucion is the last line in the following code snippet:

0000000000025920 <ev_history_can_go_back>:
   25920:       53                      push   %rbx
   25921:       48 89 fb                mov    %rdi,%rbx
   25924:       e8 67 fa ff ff          callq  25390 <ev_history_get_type>
   25929:       48 85 db                test   %rbx,%rbx
   2592c:       74 42                   je     25970 <ev_history_can_go_back+0x50>
   2592e:       48 8b 13                mov    (%rbx),%rdx
   25931:       48 85 d2                test   %rdx,%rdx
   25934:       74 05                   je     2593b <ev_history_can_go_back+0x1b>
   25936:       48 39 02                cmp    %rax,(%rdx)
   25939:       74 0f                   je     2594a <ev_history_can_go_back+0x2a>

This is a jump to ev_history_can_go_back+0x2a and, apparently, this is not an access to [heap] at address 0x555555860750. Is this perf report wrong?


UPDATE

How about the following backtrace?

11159097179866 0xfb80 [0x1778]: PERF_RECORD_SAMPLE(IP, 0x4002): 7309/7309: 0x7ffff6d6c310 period: 10000 addr: 0x7ffff7034e50
... FP chain: nr:0
... user regs: mask 0xff0fff ABI 64-bit
.... AX    0x555555b8b4c0
.... BX    0x555555c48e10
.... CX    0x1
.... DX    0x7fffffffd988
.... SI    0x7fffffffd980
.... DI    0x555555b8b4c0
.... BP    0x258
.... SP    0x7fffffffd978
.... IP    0x7ffff6d6c310
.... FLAGS 0x20e
.... CS    0x33
.... SS    0x2b
.... R8    0x27c
.... R9    0x24
.... R10   0x2a2
.... R11   0x0
.... R12   0x258
.... R13   0x555555b8b4c0
.... R14   0x3000
.... R15   0x7ffff5747000
... ustack: size 5768, offset 0xd8
 . data_src: 0x5080022
 ... thread: evince:7309
 ...... dso: /usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30
evince  7309 11159.097179:      10000    mem_load_uops_retired.l3_miss:uppp:     7ffff7034e50         5080022 N/A|SNP N/A|TLB N/A|LCK N/A
        7ffff6d6c310 cairo_surface_get_device_scale@plt+0x0 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d91029 gdk_window_create_similar_surface+0xc9 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d95410 gdk_window_begin_paint_internal+0x350 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d956f1 gdk_window_begin_draw_frame+0xc1 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff73c4942 gtk_widget_render+0xd2 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
        7ffff7268858 gtk_main_do_event+0x708 (/usr/lib/x86_64-linux-gnu/libgtk-3.so.0.2200.30)
        7ffff6d79764 _gdk_event_emit+0x24 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d897f4 _gdk_window_process_updates_recurse_helper+0x104 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d8a9f5 gdk_window_process_updates_internal+0x165 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d8abef gdk_window_process_updates_with_mode+0x11f (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff574510c g_closure_invoke+0x19c (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
        7ffff575805d signal_emit_unlocked_R+0xf4d (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
        7ffff5760714 g_signal_emit_valist+0xa74 (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
        7ffff576112e g_signal_emit+0x8e (/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.5600.4)
        7ffff6d82ac8 gdk_frame_clock_paint_idle+0x3c8 (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff6d6e07f gdk_threads_dispatch+0x1f (/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.2200.30)
        7ffff546ad02 g_timeout_dispatch+0x12 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
        7ffff546a284 g_main_dispatch+0x154 (inlined)
        7ffff546a284 g_main_context_dispatch+0x154 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
        7ffff546a64f g_main_context_iterate+0x1ff (inlined)
        7ffff546a6db g_main_context_iteration+0x2b (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.4)
        7ffff5a2be3c g_application_run+0x1fc (/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.5600.4)
        555555573707 main+0x447 (/opt/evince-3.28.4/bin/evince)
        7ffff4a91b96 __libc_start_main+0xe6 (/lib/x86_64-linux-gnu/libc-2.27.so)
        555555573899 _start+0x29 (/opt/evince-3.28.4/bin/evince)

The access point is at offset 0 of the following disassembly:

Dump of assembler code for function cairo_surface_get_device_scale@plt:
   0x000000000002a310 <+0>:     jmpq   *0x2c8b3a(%rip)        # 0x2f2e50
   0x000000000002a316 <+6>:     pushq  $0x1c7
   0x000000000002a31b <+11>:    jmpq   0x28690

This is an unconditional jump which will not lead to macrofusion.


Solution

  • On Intel CPUs at least, cmp %rax,(%rdx) can macro-fuse with the following je, while also micro-fusing the load. https://agner.org/optimize/. Also related: Micro fusion and addressing modes (this is a non-indexed addressing mode so this can stay micro-fused even on Sandybridge/IvyBridge).

    So in the fused domain (where retirement happens) you really do have single-uop compare-and-branch with a memory source. Note that mem_load_uops_retired.l3_miss:uppp counts uops, not instructions.

    Even in the unfused domain, macro-fused compare/branch really executes as a single uop on a single execution unit, but the load does have to execute on a separate port. (Micro-fusion saves decode/issue front-end bandwidth, and uop cache space, but not back-end ports.)