Wednesday, June 27, 2012

Simple Tools To Analyze Thread Dumps

When debugging problems in, or analyzing the performance of, your Java applications, you sometimes have to sift through one or more Java thread dumps. Depending on the size and complexity of your Java applications, these thread dumps can be lengthy. It is hard to use a text editor and scan it for things like long running threads, blocked threads and deadlocked threads.

Clearly, you can use sophisticated Java analysis tools (such as JProfiler) for such a job. However, there are instances where you simply don't have access to such tools. For example, if you are supporting a remote customer and all you have is the ability to ask the customer to use "jstack" or to send a SIGQUIT signal to the Java app process. You usually cannot ask a customer to stop an app and restart it with the appropriate debugging system properties and options and then connect to it with a tool like JProfiler (which also assumes they even have such a tool installed and available).

So, what you normally have to fall back on is obtaining a thread dump (which is nothing more than normal text). Once that thread dump is captured, you can save it as a .txt file, send it around via email, post it via pastebin or even send it over IM (if its small enough).

A simple way you can get a thread dump is via the "jstack" utility that ships with the JDK. Once you know the pid of your Java JVM process (to get this information, you can use the standard "ps" utility or you can use "jps", which is another JDK utility), you simply tell "jstack" to output a thread dump associated with that process which you redirect to a file:

# find the process IDs of all of my running Java applications
$ jps
21147 Jps
16640 Main

# take an initial thread dump snapshot of my "Main" Java application
$ jstack 16640 > /tmp/my-thread-dump.txt

# take a second thread dump snapshot and append it to the original
$ jstack 16640 >> /tmp/my-thread-dump.txt

The question then is - how do you analyze it? I recently came across a couple of small, free, easy-to-install, easy-to-run GUI tools that help assist in analyzing thread dumps. The first is Samurai and the second is TDA (Thread Dump Analyzer). They are easy to download and install:
  • Samurai is downloaded as a simple samurai.jar file that you start via "java -jar samurai.jar"
  • TDA is downloaded as a zip file that you unzip which again gives you a simple tda.jar that you start via "java -jar tda.jar"
I like the extremely simple way to install and run these.

Each tool has basically the same way to start - you just tell it to open up a text file that contains one or more thread dumps in them (such as the "my-thread-dump.txt" file that my example above captured).

Each tool has its own way of displaying the threads:

Samurai Main Screen That Shows Threads and Their States
TDA Main Screen That Shows Threads and Their States

Its fairly obvious how to use these tools, so I won't bore you with details. Just click around, and you'll get it fairly quickly. They both have relatively intuitive user interfaces. There isn't much too them - don't expect any artificial intelligence to analyze your threads - but you can use them to navigate through the stack traces of all the threads easier than scrolling up and down the base text file.

Notice also that both tools can analyze multiple thread dumps if it detects more than one in your thread dump text file (in my snapshots above, I had three thread dumps). It is excruciating trying to do this by hand (scrolling through multiple thread dumps in a single file) so this is when these tools are really of great help. I also like how Samurai, in one view, shows the states of the threads that are common in the multiple thread dumps. So you can see which threads were running, blocked, etc and which ones changed states between the different thread dumps. However, Samurai loses points because, if thread names are really long, you are forced to scroll horizontally to see the names of the shorter-named threads and to see the states.

One final thing I wanted to show is how they can quickly point out the threads that are deadlocked. This is where manually scanning thread dumps with your eyes can be slow, so these tools can definitely save some time. I took some deadlock code I found online and ran it. Using "jstack", I captured a thread dump and used both tools to see how they report the deadlock. Here's what to expect if you analyze thread dumps that have deadlocks in them:

Samurai Table of Threads Showing The Two That Are Deadlocked
Samurai Showing In Red The Stacktraces of the Deadlocked Threads
TDA Showing The Deadlocked Threads and Their Stacktraces
That's all I wanted to show. It looks like either of these tools can be useful to do simple analysis of thread dumps and to help quickly determine if a thread dump has one or more deadlocks.

Monday, June 18, 2012

EJB Calltime Monitoring

I wanted to give a brief illustration of RHQ's calltime monitoring feature by showing how it is used to collect EJB calltime statistics. The idea here is that I have an EJB application (in my example, I am using the RHQ Server itself as the monitored application!) and I want to see calltime statistics for the EJB method calls being made. For example, in my EJB application being monitored, I have many EJB Stateless Session beans (SLSBs). I want to see how my EJB SLSBs are behaving by looking at how many times my EJB SLSB methods are called and how efficient they are (that is, I want to see the maximum, minimum and average time each EJB SLSB method took).

So, first what I do is go to my EJB SLSB resources that are in inventory and I enable the "Method Invocation Time" calltime metric. You can do this on an individual EJB resource or you can do a bulk change of all your EJB's schedules by navigating to your EJB autogroup and changing the schedule in the autogroup's Schedules subtab:

At this point, the RHQ Agent will collect the information and send it up to the RHQ Server just like any other metric collection. Over time, I can now examine my EJB SLSB calltime statistics by traversing to my EJB resource's Calltime subtab:

I can even look at an aggregate view of all my EJBs via the EJB autogroup. In the RHQ GUI, you will see in the left hand tree all of my EJB SLSBs are children of the autogroup node "EJB3 Session Beans". If I select that autogroup node and navigate to its Calltime subtab, I can see all of the measurements for all of my EJB SLSBs:

Note that this calltime measurement collection feature is not specific to EJBs, or the JBossAS 4 plugin. This is a generic subsystem supported by any plugin that wants to enable this feature. If you want to write your own custom plugin that wants to monitor calltime-like statistics (say, for HTTP URL endpoints or any other type of infrastructure that has a "duration" type metric that can be collected) you can utilize the calltime subsystem to collect your information and let the RHQ Server store it and display it in this Calltime subtab for your resource.