erlangheap-memorycrash-dumps

"no next heap size found: 18446744071789822643, offset 0"


I've written a simulator, which is distributed over two hosts. When I launch a few thousand processes, after about 10 minutes and half a million events written, my main Erlang (OTP v22) virtual machine crashes with this message:

no next heap size found: 18446744071789822643, offset 0.

It's always that same number - 18446744071789822643.

Because my server is very capable, the crash dump is also huge and I can't view it on my headless server (no WX installed).

Are there any tips on what I can look at?

What would be the first things I can try out to debug this issue?


Solution

  • First, see what memory() says:

    > memory().
    [{total,18480016},
     {processes,4615512},
     {processes_used,4614480},
     {system,13864504},
     {atom,331273},
     {atom_used,306525},
     {binary,47632},
     {code,5625561},
     {ets,438056}]
    

    Check which one is growing - processes, binary, ets?

    If it's processes, try typing i(). in the Erlang shell while the processes are running. You'll see something like:

    Pid                   Initial Call                          Heap     Reds Msgs
    Registered            Current Function                     Stack              
    <0.0.0>               otp_ring0:start/2                      233     1263    0
    init                  init:loop/1                              2              
    <0.1.0>               erts_code_purger:start/0               233       44    0
    erts_code_purger      erts_code_purger:wait_for_request        0              
    <0.2.0>               erts_literal_area_collector:start      233        9    0
                          erts_literal_area_collector:msg_l        5              
    <0.3.0>               erts_dirty_process_signal_handler      233      128    0
                          erts_dirty_process_signal_handler        2              
    <0.4.0>               erts_dirty_process_signal_handler      233        9    0
                          erts_dirty_process_signal_handler        2              
    <0.5.0>               erts_dirty_process_signal_handler      233        9    0
                          erts_dirty_process_signal_handler        2              
    <0.8.0>               erlang:apply/2                        6772   238183    0
    erl_prim_loader       erl_prim_loader:loop/3                   5              
    

    Look for a process with a very big heap, and that's where you'd start looking for a memory leak.

    (If you weren't running headless, I'd suggest starting Observer with observer:start(), and look at what's happening in the Erlang node.)