I'm running a Java application using a container, but I'm trying to output a HeapDump when an OutOfMemoryError occurs. However, there is also an opinion that it is better to use JFR. Please tell me how to use these.
Also, I'm worried that the size of the JFR dump file might be too large. It is assumed to be used with ECS on Fargate. I think that JFR has more information that can be obtained, but I would like to receive your opinion, including differences in speed, memory usage, data size, etc. Thank you.
JDK Flight Recorder has the jdk.OldObjectSample event enabled by default.
$ java -XX:StartFlightRecording ...
The event provides samples of objects that are on the heap, spread out over time. The objects are cheap to collect and the overheads of JFR will typically not exceed 1%.
This may not be sufficient to find the memory leak, in which case a stack trace can be collected with each sample. This can be very useful if the memory leak is in an ArrayList or HashMap as they contain an internal array that is (re)allocated as the collection grows, which means that the code path that added the leaking object to the collection can usually be seen.
In JDK 17, or later, stack traces can be enabled with:
$ java -XX:StartFlightRecording:memory-leak=stack-traces ...
In releases prior to JDK 17, it's easiest to enable stack traces by using the profile configuration:
$ java -XX:StartFlightRecording:settings=profile ...
Collecting stack traces is more expensive and could in some circumstances increase the CPU usage above 1%.
Finally, it's possible to also collect the shortest path to the GC roots, which will contain reference that shouldn't be there.
In JDK 17, or later, path to gc roots can be enabled with:
$ java -XX:StartFlightRecording:memory-leaks=gc-roots ...
In releases prior to JDK 17:
$ java -XX:StartFlightRecording:settings=profile,path-to-gc-roots=true ...
This will introduce a stop-the-world pause at the end of the recording. It could potentially block the application for several seconds while JFR traverses the heap. The amount of data written to disk is however small, typically not more than a few MB.
The object samples are spread in time. The samples taken at the beginning are objects created during initialization, for example, singleton objects. Objects at the end are often short-lived objects that will soon be garbage collected. Object in the middle are likely memory leak candidates.
For more information, see documentation here Troubleshoot Memory Leaks: Use java and jcmd commands to Debug a Memory Leak.
HPROF dumps in relation to JFR
The jdk.OldObjectSample event was developed to address the following shortcomings with HPROF dumps:
The dump file can be of unmanageable size (with large heaps, potentially hundreds of gigabytes needs to be serialized) causing problems both in data transfer and later analysis trying to open it using a GUI
Dumping the entire heap is a relatively long stop-the-world operation, this can be problematic especially in production environments
Serializing the entire heap in order to move it off the system for analysis further means a risk of exposing security sensitive information (such as passwords and other sensitive information stored on the Java heap)
An HPROF dump doesn't give contextual information such as when a leak occurred, or where the object was allocated (stack trace) and by which thread. Also, in order to determine difference over time, multiple HPROF files are needed