erlangelixirmnesia

Writing millions records to mnesia table takes up a lot of memory(RAM) and not reclaim even these records are deleted


I am running an Erlang application that often writes millions of records to the mnesia table for making a scheduler. When the time is due, the records get executed and removed from the table. The table is configured with {type, disk_copies}, {type, ordered_set}. I use transaction operations for writing and dirty operations for deleting records.

I have an experiment that writes 2 million records and then deletes all of them: the RAM memory was not reclaimed after it finished. There is a spike that twice increases the memory when I start to delete those records. For example, the beam memory starts as 75MB, and becomes after the experiment 410MB. I've used erlang:memory() to inspect the memory before and after, found that the memory was eaten by the process_used and binary but actually, I did not have any action with binary. If I use erlang:garbage_collect(Pid) for all running processes, the memory gets reclaimed, leaving 180MB.

Any suggestions for troubleshooting this issue would be highly appreciated. Thank you so much.


Solution

  • Answer from Rickard Green from Elrang OTP:

    The above does not indicate a bug.

    A process is not garbage collected unless it reaches certain limits, for example, it needs to allocate heap data and there is no free heap available. If a process stops executing, it does not matter how long time passes, it won't automatically garbage collect by itself unless it reaches one of these limits. A garbage collection can be forced by calling erlang:garbage_collect() though.

    A process that has had a lot of live data (and by this have grown large) but at the time of the garbage collection has no live data wont shrink down to its original size immediately. It will instead get a relatively large heap. The heap space is free for usage by the process, but it is allocated from the system's point of view. The relatively large heap is selected in order to avoid triggering garbage collections unnecessarily frequent.

    Not only your processes are effected when you execute. Also other processes might build up heap in order to serve your processes.

    If you look at memory consumption via top or similar, it is also expected that memory usage will have increased after execution even if you are able to garbage collect every process down into its initial size. This due to memory allocators that place memory blocks into larger chunks of memory which cannot be removed until the whole memory chunk is free. More or less every memory allocation system that exist will have this characteristic.