javadebuggingdeadlockthread-dump

Thread Dump Analysis Tool / Method


When the Java application is hanging, you don't even know the use case that is leading to this and want to investigate, I understand that thread dumps can be useful.

But how can we easily derive useful data from the thread dumps to find where the problem is? The server application that I've been working with produces very long thread dumps, because it is an EJB architecture and thread dumps contains many container threads that I'm not sure I should be looking at (i.e. threads that are not running my application code, but JBoss's code).

Yesterday I tried the Thread Dump Analyzer tool. The tool is definitely better than looking at the raw thread dumps in a text editor, because you can filter out threads that you're not interested in, see the thread list, click on a thread to see its details, compare thread dumps to find long running threads, etc.

But there's still too much data to analyse - almost 300 threads. I don't know of any criteria that I could use to filter out all the JBoss threads, in which I'm not interested. I'm not sure if I should be looking at threads that are currently in "runnable" state only or if "waiting on condition" and "in Object.wait" are also important.

What's the approach that you would normally follow and tools that you would in general use?


Solution

  • One set of thread dumps alone will not be too helpful to get to the root cause.

    The trick is to take 4 or 5 sets of thread dumps at an interval of 5 seconds between each. so at the end you will have a single log file which has around 20 - 25 seconds worth of action on the app server.

    What you want to check is when a stuck thread or long running transaction happens, all the thread dumps will show a certain thread id is at the same line in your java stack trace. In simpler terms, the transaction (say in an EJB or database) is spanning across multiple thread dumps and hence needs more investigation.

    Now when you run these through Samurai (I havent used TDA myself), it will highlight these in Red colour so you can quickly click on it and get to the lines showing issues.

    See an example of this here. Look at the Samurai output image in that link. The Green cells are fine. Red and Grey cells need looking at.

    A Samurai example from my own web app below shows a stuck sequence for Thread'19' across a span of 5 - 10 seconds

    >     Thread dump 2/3 "[ACTIVE] ExecuteThread: '19' for queue:
    > 'weblogic.kernel.Default
    > (self-tuning)'" daemon prio=7
    > tid=07b06000 nid=108 lwp_id=222813
    > waiting for monitor entry
    > [2aa40000..2aa40b30]     
    > java.lang.Thread.State: BLOCKED (on
    > object monitor)      at
    > com.bea.p13n.util.lease.JDBCLeaseManager.renewLease(JDBCLeaseManager.java:393)
    > - waiting to lock <735e9f88> (a com.bea.p13n.util.lease.JDBCLeaseManager)
    > at
    > com.bea.p13n.util.lease.Lease$LeaseTimer.timerExpired(Lease.java:229)
    

    ...

    > Thread dump 3/3 "[ACTIVE]
    > ExecuteThread: '19' for queue:
    > 'weblogic.kernel.Default
    > (self-tuning)'"   daemon prio=7
    > tid=07b06000 nid=108 lwp_id=222813
    > waiting for monitor entry
    > [2aa40000..2aa40b30]     
    > java.lang.Thread.State: BLOCKED (on
    > object monitor)      at
    > com.bea.p13n.util.lease.JDBCLeaseManager.renewLease(JDBCLeaseManager.java:393)
    > - waiting to lock <735e9f88> (a com.bea.p13n.util.lease.JDBCLeaseManager)
    > at
    > com.bea.p13n.util.lease.Lease$LeaseTimer.timerExpired(Lease.java:229)
    

    update

    I recently used the Java Thread Dump Analyzer mentioned in this answer and it's been very useful for Tomcat as opposed to Samurai