Wednesday, March 25, 2009

Correlating Events with Jopr

Today, I rediscovered how nice Jopr really is when I enhanced the agent plugin so it can track the agent log files.

In just an hour or two, I added a feature allowing you to enable log tracking for the agent itself. If you enable this feature, you can view the agent's log messages directly in the Jopr UI. This enables you to see what's going on inside an agent, and you can corrolate those log events with other types of changes happening to your agent (e.g. configuration changes, monitoring data, alerts, etc).

Let me discuss some of the nice things this allows you to do. Remember, even though this is a concrete example using the agent itself as the managed resource, everything I'm about to discuss can be done for your own managed resources because all of these features are abstract and can be utilized by any plugin, should the plugin developer choose to use them. This is what an abstract management framework provides you and is what Jopr is all about.

First, I took between one and two hours to add code to the agent plugin in order for it to utilize the event subsystem provided by Jopr. A little bit of Java code, a little bit of XML and I went from nothing to being able to fully integrate the agent log files into the Jopr events subsystem.

OK, so, what does this get you? First, and the most obvious, is you can now view the log message events from within the Jopr UI. You do not have to remotely log into the machine where the agent is running to view its log messages. See the image here on the left - this is the event history view. Because the agent emits its log messages as events, this event history view is essentially browsing the log file, with the added bonus of being able to filter the view based on the log message content and the severity of the messages (INFO, WARN, ERROR, etc).

Second, you can view the events corrolated with monitoring data - this allows you to see what your resource's measurements looked like at the time the resource was logging messages (which might help you in diagnosing a problem). Take a look at the bottom of the graphs and you can see the different colored icons to indicate the highest severity of events that occurred in any timeslice shown on the graph. Here you can see INFO (green), WARN (yellow) and DEBUG (blue) messages occurring in different times. And notice how you can corrolate those times with measurement data and event activity.

You can even drill down into the log file directly from this view so you can read the first several log messages that occurred within a narrow span of time. Again, this might help you to diagnose a problem if, using this agent resource as an example, you can see the actual ERROR log messages that occurred in or around the same time you saw the average execution time for sent commands starting to go up.



Jopr can be more proactive with these events/log messages, as well. You can define an alert definition such that you will get notified (via email for example) if the agent emits a log message at the ERROR severity level. You can even be alerted when a specific log message is emitted (e.g. have you ever wanted to be emailed when your application spits out an OutOfMemoryError log message? Now you can!). The alert definition UI page allows you to set this up.

And finally, you can use the summary timeline to further corrolate log message events with other things that have happened to this agent resource. For example, notice that I can see when my event messages occurred (and what their highest severities were) on this timeline correlated with other things that happened to this agent, such as when its configuration changed and when alerts were triggered. You would also see when operations were invoked on this timeline as well had any operations been executed during this timeframe that the timeline is showing. In effect, you get a wholistic view of what happened to this agent resource, across all the different management facets: configuration changes, control executions, alerts, events, etc!


Once again, I must emphasize the fact that you can get all of this functionality, too - and all you need is to write a plugin that talks to your managed resource and provides the raw data to Jopr. You get everything else for free - the corrolated timeline, the monitoring graphs, alerting and more.

So, even though the above screenshots show this functionality for the agent resource, the UI and all of these capabilities would be the same for any resource you want to manage, so long as that managed resource has a plugin that provides the same kind of raw information. In my case, I just had to spend a couple of hours to get the agent to report events from its log files, and Jopr took care of the rest. For example, I did not have to do anything to enable the alerting capabilities or the corrolating timeline. That is the value that Jopr brings to the table!

2 comments: