javajvmwebspherehprofj9

Profiling WebSphere with hprof for CPU SAMPLES output


I'm trying to profile WebSphere using hprof over IBM stack (J9 JVM on AIX / Linux). Specifically, I'm interested in the CPU samples output from hprof, and particularly, the startup time (from the time WS is started until it is "ready for business").

The problem is, I can't get the CPU samples output in the hprof result file.

I'm using the following JVM argument for configuring hprof: -Xrunhprof:cpu=samples,file=path-to-hprof.txt, for dumping hprof output in ASCII format. According to the generated hprof output, the CPU SAMPLES output is only generated at program exit:

HEAP DUMP, SITES, CPU SAMPLES|TIME and MONITOR DUMP|TIME records are generated at program exit.

So, for shutting down WebSphere gracefully after it successfully started, I'm using the stopServer.sh script, and expecting the CPU SAMPLES output to be present in the result java.hprof.txt file after shutdown completes, but it isn't.

What am I doing wrong? Is there a better method for using hprof with WebSphere and generating CPU profiling output? Any help will be much appreciated!

Edit: I'm running WebSphere version 8.0.0.11 over IBM J9 VM (build 2.6, JRE 1.6.0 20150619_253846) on RHEL 7.5.

P.S.: I also looked for a way for closing WS from the management console GUI, but couldn't find any.

P.P.S.: In the meanwhile I'm using the very nice jvmtop tool with the --profile <pid> option, but that provides only partial insight, and as opposed to hprof, has to be attached on the fly, so some parts of the execution are lost.


Solution

  • Thanks to @kgibm's helpful hints, I realized I was on the right track, and went back the next day to try again. Surprisingly, this time, it worked! The hprof file was generated with the expected WebSphere CPU samples output.

    I kept experimenting to figure out what I got wrong in the first place. Here's what I think has happened:

    1. At first, I had a couple of native agents specified in WebSphere JVM arguments. The combination of these agents caused WS to run much slower. When I killed WS, there were a few seconds between the Server server1 stop completed message was printed and hprof.txt being completely written. I believe I was too quick to view hprof.txt, before the CPU samples output was actually written.

    2. Then, for troubleshooting this issue, I added the doe=n parameter to the hprof argument. doe stands for Dump On Exit, and defaults to y. Only later I realized that this is probably wrong, since as quoted, CPU samples output is only generated at exit.

    I think that these two issues together contributed to my confusion, so when I started clean, everything was OK.

    Perhaps it is worth clarifying in hprof documentation that the doe=n option is conflicting with cpu=samples, and possibly with the other options that write on exit as well (I didn't see such an indication in the docs, but it's possible I've missed it).