Thoughts From A Management Platform Developer: 2011

Friday, December 16, 2011

Managing Compliance Across Multiple Resources

In the latest RHQ project release, and Red Hat's JBoss Operations Network product, a new feature called "drift monitoring" has been introduced.

If you've seen my previous blog, it (along with its demo) described how you can monitor changes in files belonging to a single resource. If a resource's files deviate from a trusted snapshot, that resource will be considered out of compliance and the UI will indicate when this happens.

This pinning/compliance feature within the drift subsystem can be combined with drift templating to allow you to pin a single snapshot of content to multiple resources. This allows you to have a single snapshot which all resources can share. In other words, if you have a cluster of app servers and they all have the same web application deployed, you can pin a snapshot of the in-compliant web application to a drift template that all of your app servers use when they scan for drift. Thus, for an entire cluster, if one of the servers in that cluster drifts away from that shared, in-compliant snapshot, that server will be flagged as "out of compliance".

To see how this works, see my "Managing Compliance Across Multiple Resources" demo.

Thursday, December 15, 2011

Managing Compliance with Drift Pinning

In the latest RHQ project release, and Red Hat's JBoss Operations Network product, a new feature called "drift monitoring" has been introduced.

Drift monitoring provides the ability to monitor changes in files and to determine if those files are in or out of compliance with a desired state. In other words, if I installed an application and someone changes files in that installation, I can be told when those changes occurred and I can analyze those changes.

I put together another demo for my "drift demo series" that illustrates this concept:

Managing Compliance with Drift Pinning

How it works is this - suppose you have a set of files on your machine and you don't want those files changed. In other words, the files you have now are "in compliance" with what you want and this set of in compliance files should not be touched. In RHQ/JON, you would create a new drift definition and pin that definition to your current snapshot of files. This pinning effectively marks that snapshot of files as the "in compliant" version. Any changes now made to those files being tracked will be considered drift and out of compliance. In the graphical user interface, you can see what has gone out of compliance and you can drill down to see what files drifted and even what parts of those files drifted.

This pinning/compliance feature within the drift subsystem can be combined with drift templating to allow you to pin a single snapshot of content to multiple resources allowing you to have a single snapshot which all resources can share. In other words, if I have a cluster of app servers and they all have the same web application deployed, I can pin a snapshot of the in-compliant web application to a drift template that all my app servers use when they scan for drift. Thus, for my entire cluster, if one of my servers in that cluster drifts away from that shared, in-compliant snapshot, that server will be flagged as "out of compliance". I will be posting another demo to illustrate this concept in the near future.

Thursday, December 8, 2011

JBoss ON 3.0 Released and a Drift Demo

Red Hat has released the next version of JBoss Operations Network. This is version 3.0 and incorporates the new look and feel of the GWT user interface that RHQ introduced recently. The bundle UI has been enhanced a bit as well (it was the only GWT-based portion of the UI that was in JBoss ON 2.4.1).

A new feature introduced in this JBoss ON 3.0 release is drift monitoring. If you have been following the progress of the RHQ upstream project, you'll know that this drift monitoring feature provides administrators with the ability to keep on the look out for unintended changes (either accidental or malicious) to your installed software and other file content.

I posted a demo showing the basics of the new drift feature:

"The Basics of Drift Monitoring" Demo

I plan on posting some more demos on drift in the near future.

Thursday, October 6, 2011

A Problem With Paging And JPA

I just came across an insidious problem in some code that went undetected for a while because it was hard to detect. I only found it through a unit test I ended up writing and I happen to come across it (I guess unit tests are a good thing!).

The problem has to do with paging through database data and the query used to do the paging.

The code in question was using a JPA query that looked like this:

SELECT r.id, r.foo
  FROM Table t
  JOIN t.relation r
 WHERE t.id = :tid

This query has the potential to return many hundreds of rows. The code using this query, therefore, used paging to retrieve chunks of the data, one page at a time. This paging code did so by setting the appropriate start/max rows via the JPA API:

query.setFirstResult(startRow);
query.setMaxResults(pageSize);

In my unit test, I set up test data that contained over 1,000 rows to be returned by this query. Much to my surprise, my unit test was failing because it wasn't getting the full amount of rows expected. Worse yet, after multiple runs of my test (which paged through what was supposed to be the entire data set), it was getting what seemed to be a random number of rows back for each run - it wasn't even consistent! Sometimes the unit test would get back 800 rows, other times 900. Sometimes it would work with 2 or 3 pages of data, but would be broken with more pages. The reason why this was hard to detect (without a unit test) was because it only showed up when there was many multiple pages of data to be retrieved (thus it required first to have a lot of data and second it required you to actually scan the data and notice you had less rows than expected).

It turns out that some database implementations do not guarantee the order in which a query returns back results without a defined ORDER BY clause in the query. In this case here, because the query didn't have a ORDER BY clause, each time the paging code ran a query to obtain a page of data, the query was returning the data in a different order. Code that ran the query multiple times to obtain different pages (that is, with different firstRow/maxResults settings) wasn't guaranteed that it wouldn't get back duplicate rows from a previous query execution.

I found this behavior in both Postgres 8 and Oracle 10.

The good news is once I added an ORDER BY clause to that query, all worked well and my unit test started passing:

SELECT r.id, r.foo
  FROM Table t
  JOIN t.relation r
 WHERE t.id = :tid
 ORDER BY r.id

So the moral of the story is: if you ever use paging queries, always double check your results by testing with many multiple pages of data and if you see problems, put an ORDER BY clause in your query to see if it fixes the issue.

NOTE: I found the following two resources that talk about this issue (I'm sure there are more). See:

Wednesday, July 6, 2011

Telling GWT To Ignore Certain Classes

We recently hit a problem in RHQ that required us to learn about a feature in GWT (specifically the GWT compiler) that was very useful and I thought I would blog about it.

First, some background. In RHQ, we have split up the source code into several "modules". Each module is built by Maven and artifacts are produced (for example, some modules will output a jar file after the build completes). Dependent artifacts are built first, then modules that depend on other modules' artifacts are built afterwards - Maven knows how to maintain the proper dependent hierarchy so it can build modules in the proper order. After our entire suite of modules are built, the build system assembles all the module artifacts into the RHQ distribution. As you can see, there is nothing special here, any complex application needs a build system that does the same thing.

One of RHQ's modules is what we call the "domain" module. This is simply the module that contains the source code for all of our domain objects that map to our data model (in other words, the domain objects simply represent the set of all of our database entities, such as Users, or Roles, or Resources, or Alerts, etc.)

These domain objects are essentially the basic building blocks on which all of RHQ is built. These domain objects are used all throughout the RHQ codebase - in the agent, in the server, in the remote CLI and in the GWT client. All of their modules depend on the domain module - the domain module is one of the first ones built.

Since our GWT client needs these domain objects, the domain module needs to not only be passed through the Java compiler, it needs a second pass through the GWT compiler.

Here is where the problem comes in. Some of our domain objects require that they import certain Java classes that GWT doesn't support. Because the domain objects are used everywhere, they end up needing to provide functionality that is required by the server and agent, not just the GWT client. But this functionality sometimes requires the use of certain JRE features that are not emulated, and thus not available, in the GWT runtime (things such as java.io or java.sql classes).

But how can we do this? If we import these GWT-unsupported JRE classes into some the domain classes, the GWT client cannot load the domain jar and the GWT client becomes dead. But without those JRE classes, the functionality that the domain module needs to provide to the other modules (like the server or agent) becomes broken. It is a catch-22. So, the question becomes, how can we provide all of the domain module functionality to all dependent modules, but remove only portions of it that are not GWT compatible from the GWT client module?

One idea was to create another module - a second domain module - that was compatible with the GWT client (essentially it would be the original domain module, minus the GWT-incompatible code). We could possibly do this by refactoring the original domain objects and creatively using inheritance. But it would give us two domain jars. Adding yet another module to the build isn't something we wanted to do.

As it turns out, after being perplexed at how best to design around this problem, we found out that GWT provides a very, very easy solution to this. You can tell the GWT compiler to filter out and exclude certain classes when it compiles the Java source into GWT Javascript. Since those classes don't get compiled into GWT Javascript, the GWT client never sees them or tries to load them. Without having to split our domain module into two separate modules, we get the effect of having two different domain libraries - one for the GWT client and one for the rest of the RHQ system.

Now we can use normal Java inheritance to share common code while at the same time we ensure that only certain subclasses will use the non-GWT-supported features of the JRE. We can then exclude those subclasses from the GWT compilation thus allowing the GWT client to load the domain module, minus those classes it couldn't use anyway.

To use this feature, we had to add the <exclude> XML element to our GWT module's XML file in order to tell the GWT compiler which Java classes to ignore:


<entry-point class="org.rhq.core.client.RHQDomain" />
<source path="domain">
   <exclude name="**/DriftFileBits.*" />
</source>

This essentially tells the GWT compiler to compile all of our domain module's classes except for the DriftFileBits class. If, in the future, we have to introduce other classes that are not supported by GWT, we can add more <exclude> elements to filter out those additional Java packages or individual classes.

One negative aspect of this is that we have to be careful to not "leak" references to those excluded classes into our GWT API or GWT implementation classes. But even if we do, it is caught rather quickly when the GWT client attempts to load the application.

Monday, June 20, 2011

Deploying RHQ Bundles To Non-Platform Resources

I recently completed the initial implementation of a feature that folks have been asking for lately, so I decided to put together a quick demo to show it.

First, some background. The initial RHQ Bundle Provisioning feature (of which I have blogged about in the past) supported the ability to deploy bundle distributions (which are, essentially, just files of generic content) to a group of platforms. That is, given a set of files bundled together, RHQ can ship out those files to a group of platform resources where the agents on those platforms will unbundle the files to the root file system.

This new feature (submitted as enhancement request BZ 644328) now allows you to target your bundle deployments to a group of non-platform resources if you wish. For example (as the demo shows), we can now target bundle deployments to JBossAS Server resources. This means you can bundle up a WAR or an EAR, and push that bundle to a group of JBoss AS Servers such that the WAR or EAR gets deployed directly to the deploy/ directory (and hence gets deployed into your application servers).

The demo is 10 minutes long and shows what a simple workflow looks like to deploy a WAR bundle to a group of JBoss EAP application servers.

View the demo here.

[Note: unlike my previous demos which were built on a Windows laptop using Wink and formatted as Flash, I decided to try Istanbul on my Fedora desktop. So the demo is formatted in .ogg format, as opposed to Flash. Hopefully, this doesn't limit the audience that is able to view the demo.]

Monday, May 23, 2011

Detaching Hibernate Objects to pass to GWT

One thing we encountered pretty quickly when we started our GWT work was the fact that you can't serialize objects over the wire from server to GWT client if those objects were obtained via a Hibernate/JPA entity manager.

If you've ever worked with Hibernate/JPA, you'll know that when you get back entity POJOs whose fields are not loaded (i.e. marked for lazy loading and you didn't ask for the data to be loaded), your entity POJO instance will have Hibernate proxies where you would expect a "null" object to be (this is to allow you to load the data later, if your object is still attached to the entity manager session).

Having these proxies even after leaving a JPA entity manager session is a problem in the GWT world because the GWT client sitting in your browser doesn't have Hibernate classes available to it! Trying to send these entity POJO instances that have references to Hibernate proxies causes serialization errors and your GWT client will fail to operate properly.

This is a known issue and is discussed here.

We pretty quickly decided against using DTOs. As that page above mentioned, "if you have many Hibernate objects that need to be translated, the DTO / copy method creation process can be quite a hassle". We have a lot of domain objects that are used server side in RHQ. There was no reason why we shouldn't be able to reuse our domain objects both server side and client side - introducing DTOs just so we could workaround this serialization issue seemed ill-advised. It would have just added bloat and unnecessary complexity.

I can't remember how mature the Gilead project was at the time we started our GWT work, or maybe we just didn't realize it existed. Gilead does require you to have your domain objects and server side impl classes extend certain Java classes (LightEntity for example), so it has a slight downside that it requires you to modify all your domain objects. In any event, we do not use Gilead to do this detaching of hibernate proxies.

RHQ's solution was to write our own "Hibernate Detach Utility". This is a single static utility that you use to process your objects just prior to sending them over the wire to your GWT client. Essentially it scrubs your object of all Hibernate proxies, cleaning it such that it can be serialized over the wire successfully.

We also used this when we originally developed a web services interface to the RHQ remote API.

Here is the HibernateDetachUtility source code in case you are interested in seeing how we do it - maybe you could use this in your own GWT/Hibernate application. I think it is reuseable - not much custom RHQ stuff is going on in here.

RHQ 4 Has Been Released

It has been a very long road for this one, but we managed to release RHQ 4.

We managed to standardize on GWT as our user interface framework. Here's an example of what the new GWT based UI looks like:

There are still a few JSP pages around, but for the most part, the RHQ GUI is now a GWT application with SmartGWT components. One of the fun parts of this job is to learn new and exciting technologies (though I guess "new" is relative - but I still consider GWT a "new" GUI technology, compared to things like Struts and JSP).

Take a look, give it a test drive and see what you think.

Saturday, February 5, 2011

Alerting and Remote Script Execution

RHQ has the ability to invoke operations on any resource when it triggers alerts. When you combine this with the ability of the Script plugin to run remote scripts via its "Execute" operation, you have a very powerful mechanism to integrate your own processes and rules to help correct or workaround abnormal conditions that occur in your managed environment.

Because I've heard several people ask if RHQ has this ability, I put together a flash demo that shows how you do it. The demo shows how you can execute any script on a remote machine when an alert is triggered by RHQ. For example, if you set up an alert that detects when your app server is using an abnormally large number of threads (a possible indication of heavy load), you can have RHQ execute a custom script on your app server machine to help alleviate problems that might occur due to that condition (such a script could be one that reconfigures a load balancer to help redirect load away from your app server).

The use-cases for this feature are virtually endless. Any set of alert conditions on any managed resource can trigger the execution of any script you have. And as the demo illustrates, it is really easy to set this up within RHQ.

Wednesday, January 26, 2011

Bundle Provisioning Via RHQ

A certain amount of enhancements and UI clean up were added to the RHQ Bundle/Provisioning feature. So, I figure now would be a good time to re-introduce it in a new blog entry. I also put together a flash demo if you would like to see the UI in action.

Let me recap what this RHQ Bundle/Provisioning feature is all about. RHQ allows you to bundle up a set of files and push them out to remote machines. You can install and upgrade these sets of files as well as revert back to a previous version of the files or purge the bundle files completely. You sometimes see this mentioned as the "Provisioning" feature, and other times you will see it referred to as the "Bundle" subsystem. (I prefer the term "Bundle" since that is the term that the RHQ user interface uses).

There are a few concepts you must know in order to understand how RHQ does its thing. This is covered on the wiki, but I'll try to explain it briefly here, too.

First is the concept of "bundle". A "bundle" is a logical concept and basically refers to an application ("Pet Store Application" or "My Wiki Server"). A bundle has one or more "versions". A "bundle version" refers to an actual set of files that you want to push out to a set of remote machines. Think of it as your application distribution. Each bundle version has its own "recipe" which tells RHQ what files exist in the bundle, and how to configure and provision those files to the remote machines. Developers or application packagers are responsible for writing the recipe and 'bundling up' the application's files (hence the name 'bundle') with the recipe into an RHQ "bundle version".

What you do with a bundle version brings us to the next set of concepts. A bundle "destination" is associated with a specific bundle and is simply a place (or places) where you want to deploy your bundle. A destination specifies two things - a group (which contains one or more remote machines) and a destination directory (which specifies where on the remote machines' filesystems the bundle files should go). Once you have a "bundle destination" in place, you can begin to deploy one or more of its bundle's versions to that destination. A bundle "deployment" represents one deployment of a bundle version to a destination.

It may make more sense if I give an example. Suppose I have a web application (call it "My Application") that runs inside a JBoss application server. This is my "bundle". I actually have two versions of my application, 1.0 and 2.0. These are my "bundle versions".

Now suppose that I have a QA environment that consists of two machines - a Windows machine and a Linux machine. I want to test my application on my two QA machines. So I need to install my application on both of them. I want to install my application on each machine's "/home/mazz/opt/myapp" directory. This group of two QA machines, along with the destination directory, is a "bundle destination" for my application bundle (call it the "QA destination"). I also have a group of three Linux machines that make up my production environment. After my application passes all tests, I want to deploy my application to that production environment in the "/opt" directory. That group of three production machines, along with the "/opt" directory specification, is another bundle destination associated with my application bundle - call it the "production destination".

Once I tell RHQ to deploy the "1.0" bundle version of my application to my "QA destination", I will have a "bundle deployment". This bundle deployment will be considered the "live" deployment because its the last one I pushed out. I can then test that version while it is on my QA machines. Suppose I find that I want to upgrade my QA environment with the newer "2.0" version of my application. I simply deploy that bundle version to the "QA destination" and now I have a second "bundle deployment". This second deployment is now considered live. If I find that I do not like this newer "2.0" version of my application, I can ask RHQ to revert back to the last live deployment (which was my "1.0" bundle version) - this revert becomes yet another "bundle deployment" (the third) but it reverts back to the "1.0" bundle version content. Once I pass all of my QA tests, I can then deploy whatever bundle version I deem appropriate to my "production destination".

Most of what I describe above is actually demonstrated in my flash demo. The only thing I do not show in the demo is the use of a second "production destination", but it is the same effort to deploy to a second destination as it is to deploy to the first destination.

One new feature that has been introduced to RHQ is the ability to "purge" a destination of all bundle content. If, for example, you want to remove all bundle files completely from the QA destination, you can ask RHQ to purge that destination. What RHQ will do is remove all bundle content from the remote machines that were associated with that destination.

Another new feature that has been added is the ability for RHQ to deploy a bundle into an already existing deployment directory that may have other non-managed content that should be left alone. Such would be the case if you want to deploy an EAR or WAR to a JBossAS deploy/ directory (which obviously has other files inside of it). This deserves some additional explanation.

Typically, you will want to deploy a set of application files into its own directory on some file system. For example, if you have a JBoss application server, you want to install it in something like "/opt/my-jboss". All of your application server files are in that directory, but no other files are in there. If you want to remove your JBoss application server installation, it is as simple as "rm -rf /opt/my-jboss".

However, what if you deployed a bundle version in that directory already, but you then upgrade that bundle deployment with a new bundle version? In this case, you will already have files in /opt/my-jboss (the original bundle version content). RHQ will actually overwrite, backup or ignore conflicting files that it finds following strict upgrade rules. If, for example, RHQ finds files in that /opt/my-jboss directory that don't belong to the new bundle version, they will be removed. RHQ calls this "managing the root directory".

This is usually what you want if you are deploying a standalone software product. If there are any unknown files in the deployment directory, RHQ has to remove them to make sure the bundle deployment directory is exactly in the state the new bundle version recipe wants it to be. However, this is not what you want if you desire to deploy an EAR or WAR to an already existing JBossAS's deploy/ directory. That's because we already know there will be unrelated files in this deploy/ directory that must remain intact and in place. RHQ must leave any files it finds in that destination directory alone - even though they aren't part of our bundle deployment. In other words, we do not want RHQ to manage this root deployment directory.

RHQ now supports this by allowing the Ant recipe author to specify the manageRootDir="false" attribute in the rhq:deployment-unit task. This new feature is documented in Bugzilla 659142 and this new attribute is documented on the RHQ wiki.

You can read more about the Provisioning/Bundle feature on the RHQ wiki.

Thoughts From A Management Platform Developer