rlinuxmemory-leaksjuliavalgrind

Which Valgrind tool and option to use for investigation of RAM allocation for each function?


I have never used Valgrind, but I think this tool can help me with my question. I would be grateful for any help.

In my R code, I use the MixedModels Julia package.
I integrate Julia in R using the JuliaCall package.
I work with very large datasets (~1 GB, ~4x10^6 observations) and at the modeling step (mixed models) a lot of RAM is allocated (~130 GB), most of it does not return to the system after the end of calculations.

I would like to analyze the code and see the whole stack of R and Julia functions.
It is very important for me to understand what functions are called up during mixed models calculation with Julia (especially low-level functions, most likely written in C / C ++), and how much memory each of these functions utilize.

It is also important to understand what exactly the memory is spent on, what exactly happens in the RAM when the functions from the MixedModels package are running.

Perhaps understanding this will help me improve the performance of the code and reduce the memory allocation.

Maybe for my tasks some other tool (rather than Valgrind) will be more useful - I will be very grateful for the relevant recommendations!


Solution

  • As an example of valgrind --tool=massif, using Git 2.38 (Q3 2022) (so no r or Julia related, but just as an illustration)

    See commit 51d1b69 (26 Jul 2022) by Jeff King (peff).
    See commit 068fa54, commit 90b2bb7, commit 5766524 (19 Jul 2022) by Derrick Stolee (derrickstolee).
    (Merged by Junio C Hamano -- gitster -- in commit acbec18, 03 Aug 2022)

    The codepath to write multi-pack index (introduced here) has been taught to release a large chunk of memory that holds an array of objects in the packs, as soon as it is done with the array, to reduce memory consumption.

    midx: reduce memory pressure while writing bitmaps

    Signed-off-by: Derrick Stolee

    We noticed that some 'git multi-pack-index write'(man) --bitmap processes were running with very high memory.
    It turns out that a lot of this memory is required to store a list of every object in the written multi-pack-index, with a second copy that has additional information used for the bitmap writing logic.

    Using 'valgrind --tool=massif' before this change, the following chart shows how memory load increased and was maintained throughout the process:

    GB
    ^ 4.102                                                       ::
    |              @  @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: :
    |         :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |      :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |    :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |    : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |    : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
    +--------------------------------------------------------------->
    

    It turns out that the 'struct write_midx_context' data is persisting through the life of the process, including the 'entries' array.
    This array is used last inside find_commits_for_midx_bitmap() within write_midx_bitmap().

    If we free (and nullify) the array at that point, we can free a decent chunk of memory before the bitmap logic adds more to the memory footprint.

    Here is the massif memory load chart after this change:

    GB
    ^ 3.111#
    |      #                              :::::::::::@::::::::::::::@
    |      #        ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@
    |     @#  :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |  :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
    +--------------------------------------------------------------->
    

    The previous change introduced a refactoring of write_midx_bitmap() to make it more clear how much of the 'struct write_midx_context' instance is needed at different parts of the process.
    In addition, the following defensive programming measures were put in place:

    1. Using FREE_AND_NULL() we will at least get a segfault from reading a NULL pointer instead of a use-after-free.
    2. 'entries_nr' is also set to zero to make any loop that would iterate over the entries be trivial.
    3. Add significant comments in write_midx_internal() to add warnings for future authors who might accidentally add references to this cleared memory.

    Note that valgrind --tool=massif, as the documentation mentions measures only heap memory, i.e. memory allocated with malloc, calloc, realloc, memalign, new, new[], and a few other, similar functions.
    This means it does not directly measure memory allocated with lower-level system calls such as mmap, mremap, and brk.

    See more with "What is the difference between 'time -f "%M"' and 'valgrind --tool=massif'?".