Thursday, March 26, 2009

Cross-Facet Correlation Provided by Jopr

I'm surprised you are still reading this, given the nebulous title of this blog :). But I'm glad you are here. I want to further explain what my last blog was really trying to convey.

Jopr is essentially an abstract management framework - at its core, Jopr does not understand about the concrete managed resources it actually is managing. Specific, concrete knowledge of the manage resource implementations is pushed out to the edges - in the agent plugins themselves.

But what Jopr knows, at its core, are the different management "facets" that those managed resources support. What is a "management facet"? Simple - a facet is a subset of functionality that all managed resources may or may not support. Facets are orthogonal - a managed resource and its plugin can support any number of facets in any combination.

Jopr supports several types of facets - for example, the measurement facet (for collecting metric data), the operation facet (for controlling managed resources), the configuration facet (for retrieving and setting resource configurations), the content facet (for pushing and pulling file content to/from managed resources), the event facet (to emit asynchronous event information such as a managed resource's log file messages) and others.

There are numerous advantages for having an abstract management platform that can handle all of these different management facets for many different kinds of managed resources. One of these advantages is the ability to correlate and process information across all the different facets.

Looking at Jopr's summary timeline, and what "cross-facet correlation" means soon becomes very apparent:This timeline (with time moving from left to right), in one view has correlated information such as:
  • when the resource was up and down; notice the background color - light green means the resource was up at that time and a red background means the resource was down with grey meaning it is unknown what the state of the resource was (measurement facet)
  • when events (e.g. log messages) happened, how many of them occurred and how severe these events were (severity is shown in the badging of the events icons). (event facet)
  • when alerts were raised and how severe those alerts were (alerts are shown as flag icons with their severity indicated by their color). Alerting isn't an actual management facet per-se, the alert subsystem is provided to all managed resources with no effort needed on the part of the plugin - alerts are provided "for free".
  • when a resource's configuration was changed and whether or not that change was successful or if it failed (configuration facet)
  • when an operation was invoked on a resource and whether or not that attempt to control the resource was successful or not (operation facet)
This is what "cross-facet correlation" is all about. How is this helpful? Well, if I see a gap of events in my timeline, where that gap has a background of red, that tells me very quickly that I am not receiving any events because the resource is down! If I see that I changed the configuration of a resource, and soon after I see a flood of warning events and then perhaps followed by a red background, I can immediately begin to suspect that that configuration change caused an adverse affect on the behavior of my resource. If I executed an operation and soon after see one or more alerts trigger, I should start by investigating if that operation caused the resource to act strangely.

Cross-facet correlation occurs in other areas of the Jopr UI (and the sky's the limit to where we can take Jopr from here in the future). For example, here is the summary tab for my agent. Notice how all the different facets are combined into a single view so at a glance you can see what is currently going on with this agent resource.

I can see that I had one operation, out of the past three, that failed. I can see that I recently updated the configuration of this resource several times, one of which failed. I can see what the current measurements are for this resource, and what events and alerts have been logged, when they happened and their severity.

All of this data is linked to their respective pages in the UI - if I want to see more about the alerts, I click the alert links. More about the operations? Click the operation links. And so on.

And because Jopr has abstracted the facets so they are applicable across any number of managed resources, we can manage all different kinds of resources - JBoss Application Servers, Apache Web Servers, Tomcat Web Application Servers, hardware boxes, operating system services, even Jopr itself - and we reuse the same UI pages, the same code, and the same look & feel - no additional code needs to be introduced to the server to support additional types of managed resources!

For example, above you see the summary for the Jopr agent. What's the summary information look like for a JBoss Application Server that I am managing? You can see this here. Notice that the same look & feel, the same UI pages and code is used, the same SQL queries - everything is reused. No additional integration is needed on the server side. But notice the difference. The agent resource supports the "configuration facet" - which is why you saw the "Configure" tab and the configuration update information earlier. But the agent does not support the "content facet" (the agent does not push or pull content over the Jopr content subsystem). But look at the JBossAS resource - it does not support the configuration facet (so you don't see the Configure tab and there are no recent config updates) but it does support the content facet (you can see the Content tab along with some recent package history). The JBossAS resource component in the agent will send up to the Jopr server information about what packages (e.g. jar libraries) it has installed. If a resource supports it, you can even ship down updated packages to the resource (for example, to send down new jars that incorporate bug fixes).

The above just talks about correlating cross-facet information for a specific resource. What if I want to see information across my entire inventory? What if I want to see all of the alerts that were triggered by Jopr, regardless of the resource that triggered the alert. What if I want to see all the configuration changes made to my environment, regardless of which resource was reconfigured?

Again, because Jopr is abstract in nature, we can scan the inventory history and aggregate this kind of information. Below you see that we can view all the alerts and all configuration changes - you can even filter the results if you only care about a subset of the data.

I think this clearly shows how an abstract management platform that supports multiple management facets can provide tremendous value to anyone managing a network of hardware and software products.

Well, that's all I have to say on this subject (for now). I hope this makes it a bit more clear what Jopr brings to the table and its value-add that it can bring to your IT environment.

Wednesday, March 25, 2009

Correlating Events with Jopr

Today, I rediscovered how nice Jopr really is when I enhanced the agent plugin so it can track the agent log files.

In just an hour or two, I added a feature allowing you to enable log tracking for the agent itself. If you enable this feature, you can view the agent's log messages directly in the Jopr UI. This enables you to see what's going on inside an agent, and you can corrolate those log events with other types of changes happening to your agent (e.g. configuration changes, monitoring data, alerts, etc).

Let me discuss some of the nice things this allows you to do. Remember, even though this is a concrete example using the agent itself as the managed resource, everything I'm about to discuss can be done for your own managed resources because all of these features are abstract and can be utilized by any plugin, should the plugin developer choose to use them. This is what an abstract management framework provides you and is what Jopr is all about.

First, I took between one and two hours to add code to the agent plugin in order for it to utilize the event subsystem provided by Jopr. A little bit of Java code, a little bit of XML and I went from nothing to being able to fully integrate the agent log files into the Jopr events subsystem.

OK, so, what does this get you? First, and the most obvious, is you can now view the log message events from within the Jopr UI. You do not have to remotely log into the machine where the agent is running to view its log messages. See the image here on the left - this is the event history view. Because the agent emits its log messages as events, this event history view is essentially browsing the log file, with the added bonus of being able to filter the view based on the log message content and the severity of the messages (INFO, WARN, ERROR, etc).

Second, you can view the events corrolated with monitoring data - this allows you to see what your resource's measurements looked like at the time the resource was logging messages (which might help you in diagnosing a problem). Take a look at the bottom of the graphs and you can see the different colored icons to indicate the highest severity of events that occurred in any timeslice shown on the graph. Here you can see INFO (green), WARN (yellow) and DEBUG (blue) messages occurring in different times. And notice how you can corrolate those times with measurement data and event activity.

You can even drill down into the log file directly from this view so you can read the first several log messages that occurred within a narrow span of time. Again, this might help you to diagnose a problem if, using this agent resource as an example, you can see the actual ERROR log messages that occurred in or around the same time you saw the average execution time for sent commands starting to go up.

Jopr can be more proactive with these events/log messages, as well. You can define an alert definition such that you will get notified (via email for example) if the agent emits a log message at the ERROR severity level. You can even be alerted when a specific log message is emitted (e.g. have you ever wanted to be emailed when your application spits out an OutOfMemoryError log message? Now you can!). The alert definition UI page allows you to set this up.

And finally, you can use the summary timeline to further corrolate log message events with other things that have happened to this agent resource. For example, notice that I can see when my event messages occurred (and what their highest severities were) on this timeline correlated with other things that happened to this agent, such as when its configuration changed and when alerts were triggered. You would also see when operations were invoked on this timeline as well had any operations been executed during this timeframe that the timeline is showing. In effect, you get a wholistic view of what happened to this agent resource, across all the different management facets: configuration changes, control executions, alerts, events, etc!

Once again, I must emphasize the fact that you can get all of this functionality, too - and all you need is to write a plugin that talks to your managed resource and provides the raw data to Jopr. You get everything else for free - the corrolated timeline, the monitoring graphs, alerting and more.

So, even though the above screenshots show this functionality for the agent resource, the UI and all of these capabilities would be the same for any resource you want to manage, so long as that managed resource has a plugin that provides the same kind of raw information. In my case, I just had to spend a couple of hours to get the agent to report events from its log files, and Jopr took care of the rest. For example, I did not have to do anything to enable the alerting capabilities or the corrolating timeline. That is the value that Jopr brings to the table!

Monday, March 23, 2009

The Mighty Embeddable Plugin Container

Heiko has just demonstrated another way that the agent-side plugin container can be embedded in any Java VM.

We've already proven that this concept works because the plugin container has already been embedded in a few places: not only does the agent itself embed the plugin container, but our unit tests do it when the validity of plugins needs to be tested and also the Embedded Jopr project does it by embedding the plugin container directly in a JBossAS5 application server!

But what Heiko has done is go a step further by providing a very small, yet useful, wrapper around the plugin container to support plugin developers (it is called the "standalone plugin container"). It is "standalone" because you no longer need to install and run a full Jopr environment (a server and a database) in order to test your plugin's functionality.

If you are writing a custom plugin, just use the standalone plugin container and deploy and execute your plugin. This means you just take an existing agent distribution, and use a very simple script to start the standalone plugin container (there are Windows and UNIX versions of the script). The simplicity of these scripts border on trivial. Under the covers, all this does is run a new main class that embeds the plugin container, as opposed to the original AgentMain class (which does all the complex agent-to-server communications). This new standalone plugin container will accept commands on the prompt to help you exercise your plugin code - a great help for those writing plugins. See the README for some install instructions.

This type of capability sort of existed in the old JBoss ON 1.x code base - but its old (now obsolete) plugin model was never this modular and could never have been this easily embedded in so many different ways.