Thoughts From A Management Platform Developer: 2008

Saturday, December 6, 2008

Bundling the Jopr Agent For Deployment

One of the most requested enhancements for Jopr is for an easier way to perform agent installations. Because of this, efforts are underway to ease the pain of agent deployment.

For small Jopr environments, you can take the agent distributions as-is, install them and individually set up each agent by answering the setup questions at startup. This can be tedious the more machines you have. It would be nice if you could bundle your own "golden distro", push them out to all your machines and with no additional configuration or manual setup required, have the agents "just work".

With the current Jopr code, this is now possible. Following these steps, you can bundle your own agent into one "golden distro". We call it the "golden distro" because it's the one and only distro you will need in order to install all the agents in your environment. That distro will be able to install all your agents and have them start and configure themselves with no further setup required.

Unpackage the agent distribution that comes out-of-box. This is the starting point to build your own "golden distro".

Next, consider what, if any, customized environment variables you need to set for your agents. If there are any, edit rhq-agent-env.sh (rhq-agent-env.bat for agents that are to run on Windows). For example, if there are any -XX options you want to pass to your agent's JVM, set RHQ_AGENT_ADDITIONAL_JAVA_OPTS in the -env script appropriately.

The next several steps involve editing the conf/agent-configuration.xml file. So, load that file in an editor. Here is where you will preconfigure your agents in order for the agents to start up and successfully configure and initialize themselves, without requiring an admin to answer the setup quetions.

Set the configuration preference "rhq.agent.configuration-setup-flag" to "true". This tells the agent that, when it starts up, it should not ask any setup questions. Instead, it will immediately use what configuration preferences it has and attempt to initialize itself automatically. Of course, setting this to true infers that the rest of your agent's configuration in agent-configuration.xml is complete. But that's what we are going to make sure we do next.

Make sure that the "rhq.agent.name" configuration preference is left undefined (out-of-box, this setting is commented out in the agent-configuration.xml, make sure you keep it that way). Leaving this undefined will force the agent to attempt to auto-generate its own name. It does this by looking up the agent machine's fully qualified domain name and using that as the agent name. This should ensure that all agents will obtain a unique agent name (since by definition, a fully qualified domain name or IP address is unique within a network).

Next, determine which Jopr Server your agents will use as their "Registration Server". When new agents start up, they must communicate with a Jopr Server in order to register themselves into the Jopr environment. You must decide which of your Jopr Servers will be used to register newly installed agents (we'll call it the "Registration Server"). There is nothing special or different about a "Registration Server" compared to your other servers in your server cloud, i.e. you won't see any configuration settings or UI controls that turn on or off some "registration feature". Any Jopr Server can register any Jopr Agent. However, you must specify something in your golden distro's agent configuration so the agent knows where a server is so the agent can bootstrap itself into the Jopr environment. Once you determine which of your Jopr Servers will be the one to handle all new agent registrations, set that server's endpoint information in your golden distro's agent-configuration.xml settings:
- rhq.agent.server.transport
- rhq.agent.server.bind-port
- rhq.agent.server.bind-address
- rhq.agent.server.transport-params
Note that this will not necessarily be the Jopr Server that will be assigned as the agent's primary server. Once the registration is complete, the agent will be assigned a server failover list, with the first server in the list to be designated as its "primary". This primary server may or may not be the same as the settings you provide here.

If you wish to assign multiple "Registration Servers" to your agent, you may do so by prepopulating a failover list and putting it in your golden distro. This allows you to have more than one Jopr Server assigned to all of your agents as Registration Servers. If the main registration server is down or is in maintenance mode, the agents will be able to failover to your secondary servers as defined in your failover list. Create a directory "data" in your distribution and place a file called "failover-list.dat" in it. Each line in that file must be of the form "address:port" where address is the IP or hostname of a Jopr Server and port is the port number the server is listening on (each server must require the same transport and transport parameters, so the "rhq.agent.server.transport[-params]" settings will be used for all servers). If you prepackage a failover list in your golden distro, you should place your main Registration Server (the one you configured in the previous step) as the top-most server in the list. Each server thereafter can be listed. If the servers at the top of the list are down, the agent will still be able to register because it just moves down the list until it finds a server it can talk to. Note that this prepopulated failover list is only temporary and is used only the first time the agent starts. Once the agent registers, it will be given a new failover list which will overwrite the list shipped in the golden distro. This is what you want because the server maintains an up-to-date failover list for each agent and you want the agent to refresh its list everytime it regsiters and starts up.

Make sure "rhq.communications.connector.bind-address" is left undefined (out-of-box, it is commented out in agent-configuration.xml, make sure you keep it that way). Leaving this undefined will tell the agent it needs to lookup its local IP address and use that as its bind address. It does so by using the Java API "InetAddress.getLocalHost().getCanonicalHostName()". Therefore, this uses whatever network adapters are installed on the box and chooses one from the list to determine which IP to use - usually it chooses the first network adapter that the operating system reports. (side note: this may choose an IP from the list of available IPs that is different from the one you actually want to use. You usually have to do some special configuration in your network adapters to get InetAddress.getLocalHost() to return the one you want).

Repackage the agent installation in a new jar - this is your "golden distro". Take this distro and push it out to all of your agent machines and they can all start up without any additional configuration or setup needed.

RHQ-496 now allows the agent to determine its name at runtime if one was not specifically given to it at setup time. This code is in trunk, but not in any current Jopr releases. So to get the ability to bundle the "golden distro" and deploy it to multiple agents, you must use a trunk build, until we release our next version.

After you have deployed your golden distro to machines in your network, you are then left with the question of how do I upgrade my agents? This then leads to the desire to perform automatic upgrades of agents already deployed and running in your environment. This feature is not fully complete, but most of the work is done and exists in trunk (see my earlier blog on this topic). To follow the development of this agent auto-update feature, watch the JIRA RHQ-110. The finished implementation will hopefully look like the design described on the RHQ wiki.

Thursday, December 4, 2008

Configuration Change Detection in Jopr

A new feature has been added to trunk, a feature so interesting that it deserves its own blog.

I am sure most security-conscious administrators configure their IT infrastructure in a very specific way and they do not want anyone going onto any machine and re-configuring the machine or any of its software components willy-nilly. In fact, if something is reconfigured outside of a business' normal change-control processes, I would think administrators would want to be notified about it. It could be an innocent user mistakenly modifying something they should not be, or it could be an intruder trying to hack into the system. Being notified of configuration changes sounds like it could be a very useful thing.

Jopr now has this feature. If a plugin supports the configuration subsystem (i.e. it can retrieve configuration from its managed resource), the alert subsystem will have the ability to detect changes made in that remote managed resource and send notifications when that happens.

I've put together a demo that shows this feature in action. The scenario is quite simple - I have a Fedora box running sshd, and I do not want that sshd daemon process' configuration to change. If, for whatever reason, the configuration of sshd on the box does change, I want to be notified.

And because this config-change-notification feature is built into the core engine, any plugin that supports configuration gets this feature for free. So, if Jopr does not have a plugin that supports a particular resource whose configuration you want to monitor for changes, you can quite simply write your own plugin and deploy it into your Jopr environment and have this capability.

I can envision watching the following for configuration changes would be something people find helpful (and some of these you can already do today thanks to existing Jopr plugins):

JBossAS's main jboss-service.xml configuration file

JBossAS's authentication configuration (login-config.xml)

JBossAS's datasource configuration

/etc/hosts

Jopr Agent's own configuration

...and many more...

And configuration does not have to be stored in a file on a filesystem. The Jopr configuration subsystem makes no distinction between configuration stored in a file, in a database, an LDAP server or whatever you can think of. It's the plugin's job to translate the resource's configuration into configuration data that conforms to the plugin's metadata. Once the configuration data makes it into the core engine, it is treated the same.

And finally, if a configuration change is detected, and that change was unauthorized, the Jopr user has the ability to immediately rollback that change by reverting to an earlier configuration set. This configuration-rollback feature is orthogonal to the change-notification feature, but you can see how both can be used hand-in-hand to keep a tight grip on your IT infrastructure's configuration.

Thursday, November 27, 2008

Transaction Recovery in JBossAS

It started out so innocently - running my J2EE app under high load, I would notice this message repeat many times in the JBossAS log file:

[com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa]
Could not find new XAResource to use for recovering
non-serializable XAResource ...

The desire to hunt down the meaning of this error message set me off on a long voyage that involved learning about the innards of the JBoss Transaction Manager (JBossTS), stepping through its code, writing custom Transaction Manager objects, and reading through many pages of documentation, forum threads and JIRA issues.

What I came out of this huge endeavor is the fact that the JBossTS integration with JBossAS is not very well documented and is not easy to use. What I hope to achieve with this blog is to provide what I feel is lacking in the current JBossAS documentation, and that is a single page where you can find all the information you need in order to assure your application running in JBossAS is fully recoverable in the event of an XA transaction failure. (note: I am using JBossAS 4.2.1, hopefully, later JBossAS versions will make what I am going to discuss easier)

First, let me say this - if you think your application deployed in JBossAS can recover from transaction failures, I would recommend you read this blog and double check your configuration. Because unless you took very specific steps to configure your JBossAS deployment to support XA transaction recovery, you will not be able to recover from failed transactions - even if you are using JBossAS with JBossTS integrated and even if you are using XA datasources (i.e. your data sources are defined in your -ds.xml files via <xa-datasource>). The only outward indication that you will get that tells you your app is not able to perform transaction recovery is when you actually get a failure, which causes that above message ("Could not find new XAResource to use for recovering non-serializable XAResource") to appear in your log file.

Why is this? Because out-of-box, JBossAS is not fully configured to perform recovery. You have to manually configure JBossAS to fully enable this feature - and it's not as simple as just setting some configuration settings or deploying some additional canned services. It also involves writing/deploying your own custom Transaction Manager code and integrating it into the JBossAS server, because there are problems with some of the current code shipped with JBossAS. This is why I think many people think they can recover from transaction failures, but really cannot - because I'm not convinced many people know enough about this issue to write this custom Transaction Manager code and deploy it properly in JBossAS. This blog will hopefully help put all of this information in a single place and get people "on the road to recovery".

I will now walk you through everything that happened in my lengthy investigation of this issue. You can follow my train of thought in the JBossTS forum thread I started that shows you the progression of this investigation.

Before we begin, I must point out that for any of this to work at all, you must deploy your data sources to use XA, i.e. your -ds.xml files need to use <xa-datasource>. To learn how to switch your data sources over to use XA, refer to the JBossAS wiki, specifically the part that talks about "Parameters specific for javax.sql.XADataSource usage"

Special Note To Oracle Users: XA recovery will also not work unless you grant special privileges to your XA datasource's user (i.e. the user whose credentials you define in the xa-datasource definition). If you do not do this, "XAException.XAER_RMERR" errors will occur. The privileges you need to grant include:

GRANT SELECT ON sys.dba_pending_transactions TO db_user;
GRANT SELECT ON sys.pending_trans$ TO db_user;
GRANT SELECT ON sys.dba_2pc_pending TO db_user;
GRANT EXECUTE ON sys.dbms_system TO db_user;
GRANT SELECT ON v$xatrans$ TO db_user;

The first four you definitely need. The last one is Oracle version dependent. As documented in http://www.orafaq.com/wiki/XA_FAQ, "for Oracle 7.3 databases one needs to run the XAVIEW.SQL SQL script as user SYS. This script will create the V$XATRANS$ view. Grant select access on it to PUBLIC. This script is located in the $ORACLE_HOME/rdbms/admin directory. Please note, XAVIEW.SQL is not required for XA applications running on Oracle8 and above." Talk to your DBA to see which are required for your specific database installation. I've only tested XA on Oracle10g so I can't speak for other versions.

First, of course, was the fact that this all started by me getting those XA recovery failure messages in the JBossAS server log. (And for the curious, I believe the entire reason why I was getting those was because, under heavy load, my application was maxing out its connection pool, which actually went over my processes/sessions limit in Oracle - Oracle promptly rejected the extra connection attempts causing my transactions to fail. Bumping up my Oracle processes/sessions configuration seems to have fixed the cause of most, if not all, of my failures). Anyway, back to the XA recovery error message - at the time, it wasn't very intuitive what that log message was saying, but it sounded bad enough for me to search the 'net for this error message and see what others were saying about it. Alot of people reported seeing this, but not much was said as to how you go about correcting it, and more specifically, how to correct it within the context of the JBossAS application server. I found wiki pages that talk about this error message (such as this one), but only from the perspective of the JBossTS standalone product. This goes back to my assertion that the JBossAS needs more documentation on its integration with JBossTS, because the JBossTS documentation only gets you so far (and is why I submitted JIRA JBAS-6244). The JBossTS documentation itself seemed comprehensive, my concern was the lack of JBossAS documentation on its integration with JBossTS. For example, the JBossTS wiki page tells me this:

You need to provide an instance of a XAResourceRecovery
implementation and tie it into the recovery process

I'm sure this makes perfect sense to the developer familiar with the JBossTS API and to the core JBossAS developers themselves that are integrating JBossTS into the server. But to a J2EE developer - the guy who is simply deploying his J2EE/EJB3 app in JBossAS and who should be free from having to worry about the internals of the app server's transaction manager - this is very confusing and leads to more questions than answers. For example, what is "XAResourceRecovery"? How do I provide one? And how do I tie it into the recovery process? I found no easy answers to those questions in the JBossAS documentation.

Searching the JBossTS documentation further, I found information on the XAResourceRecovery class. It turns out this is a JBossTS API that provides the hook necessary to recover from a transaction failure for a particular resource, like a JDBC data source (that answers the question, "what is XAResourceRecovery?"). JBossTS provides a few of its own implementations out-of-box and because JBossAS ships with the JBossTS product, JBossAS itself comes with these XAResourceRecovery implementations out-of-box as well (and this answers the "how do I provide one?" question - but as you will see shortly, that is not the end of this story). From the JBossTS documentation, I see:

Recovery of XA datasources can sometimes
be implementation dependant, requiring developers to
provide their own XAResourceRecovery instances. However,
JBossTS ships with several out-of-the-box implementations
that may be useful.

This wiki page lists two implementations. One is specific to Oracle, but since my app needs to support both Oracle and PostgreSQL (and hopefully more in the future), I want to use the second one: com.arjuna.ats.internal.jdbc.recovery.JDBCXARecovery. As the JBossTS documentation states, "this recovery implementation should work on any datasource that is exposed via JNDI." (emphasis is mine, because as I will show you shortly, this class is completely unusable when deployed within JBossAS).

OK, so I should be golden now. All I need to do in order to enable XA recovery for any data source deployed in my JBossAS is to set those JDBCXARecovery configuration properties specified in that wiki page and tie that JDBCXARecovery implementation into the recovery process. Now, how do I do this? (this is that third question I had asked myself earlier). Unfortunately, there is no JBossAS documentation that I found that describes how to do this. But I did see another JBossTS wiki page that describes how to do this for the standalone JBossTS product, which states:

To inform the recovery system about each
of the XAResourceRecovery instances, it is necessary to
specify their class names through property variables.
Any property variable found in the properties file, or
registered at runtime, which starts with the name
com.arjuna.ats.jta.recovery.XAResourceRecovery will be
assumed to represent one of these instances, and its
value should be the class name.
...
Additional information that will be passed to the
instance when it is created may be specified after
a semicolon.
...
Note: These properties need to go into the JTA section
of the property file.

OK, now I know how to do this for the standalone JBossTS. But how/where do I do this for the JBossAS integration of JBossTS? I found the file "jbossjta-properties.xml" located in the "<jboss-install-dir>/server/default/conf" directory - this, as it turns out, defines the properties that configure the internals of the JBossTS integrated in the JBossAS server. Based on the instructions on how to configure the JBossTS product that was previously discussed in the JBossTS documentation, I added this in that jbossjta-properties.xml file:

<properties depends="arjuna" name="jta">
<!-- add this to tie in the recovery object to my JBossTS -->
<property
name="com.arjuna.ats.jta.recovery.XAResourceRecoveryJDBC"
value="com.arjuna.ats.internal.jdbc.recovery.JDBCXARecovery"/>
<property name="DatabaseJNDIName" value="java:/MyDS"/>
<property name="UserName" value="my-db-username"/>
<property name="Password" value="my-db-password"/>
...

That is done exactly how it is documented. However, it doesn't work. I get this at runtime:

java.lang.NullPointerException
at javax.naming.InitialContext.getURLScheme(InitialContext.java:269)
at javax.naming.InitialContext.getURLOrDefaultInitCtx(InitialContext.java:318)
at javax.naming.InitialContext.lookup(InitialContext.java:392)
at com.arjuna.ats.internal.jdbc.recovery.JDBCXARecovery.createDataSource(JDBCXARecovery.java:174)
...

I actually had to grab the JBossTS source code and step through it in a debugger to see what's really happening and why this NPE is thrown. I give the full details as to why this NPE occurs in one of my forum thread posts - go here for the technical details - but suffice it to say, in order for those three properties (DatabaseJNDIName, UserName, Password) to be read in by the recovery implementation, I had to provide a parameter to the first property (the parameter value could be anything - I could specify "foo" if I wanted - but this parameter is meant to be the URL to a property file, so I specified the name of the property file itself):

<property
name="com.arjuna.ats.jta.recovery.XAResourceRecoveryJDBC"
value="com.arjuna.ats.internal.jdbc.recovery.JDBCXARecovery;jbossjta-properties.xml"/>

OK, that hurdle has been jumped. Start up again and... whoops:

java.lang.ClassCastException: org.jboss.resource.adapter.jdbc.WrapperDataSource
at com.arjuna.ats.internal.jdbc.recovery.JDBCXARecovery.createDataSource(JDBCXARecovery.java:174)

Back to the debugger and I found that JDBCXAResovery is trying to cast the object found from the JNDI lookup to a XADataSource, but JBossAS does not bind that type of object to JNDI - it binds this WrapperDataSource, which is not a XADataSource. Therefore, this JDBCXARecovery object can never work when deployed in JBossAS. This is one reason why relying on documentation for the standalone JBossTS product is insufficient and why the lack of JBossAS integration docs is really needed. This probably works in some cases, but it most certainly does not work (and will never work) when integrated with JBossAS.

Now, it turns out this class-cast problem has already been discussed on a prior forum thread and reported in a JIRA - JBTM-319. I wish I knew that before I started all of this (did I mention we need JBossAS docs on this? :)

Reading that JIRA, it looks like there is a XAResourceRecovery implementation written specifically for deployment inside of JBossAS 4.2 (AppServerJDBCXARecovery) and it was introduced in version 4.2.3.SP8 (I'll assume it made it into that version's distribution). However, I'm using an earlier version of JBossAS, so I had to take the source code for AppServerJDBCXARecovery.java, compile and bundle its binary in a jar file, and deploy that jar into my JBossAS's "server/default/lib" directory.

I found that the Javadocs for that class describe how to configure this. In addition, it looks like Jonathan Halliday very recently added some documentation on the JBossTS wiki that discusses this as well - it refers to the Javadoc for the technical details. He does, however, confirm my findings that this class is not in earlier versions of JBossAS - "Note that AppServerJDBCXARecovery is not present in JBossAS (you need to download and build it from source) or early EAP releases"

The Javadocs say, in part:

To use this class, add an XAResourceRecovery
entry in the jta section of jbossjta-properties.xml for
each datasource for which you need recovery, ensuring the
value ends with ;<datasource-name> i.e. the same value
as is in the -ds.xml jndi-name element. You also need the
XARecoveryModule enabled and appropriate values for
nodeIdentifier and xaRecoveryNode set. See the JBossTS
recovery guide if you are unclear on how the recovery
system works.

*sigh* - back to reading about the internals of JBossTS to learn what "appropriate values for nodeIdentifier and xaRecoveryNode" means. I'm sure, again, this makes perfect sense to someone familar with the JBossTS product, but I really find it annoying that a J2EE deployer needs to know all of this just to enable XA recovery. But OK, hopefully this will get easier in the future. Marching forward...

The Javadoc instructions tell me to refer to the JBossTS Recovery Guide, and it is in there that I read:

A value of * will force JBossTS to recover
(and possibly rollback) all transactions irrespective of
their node identifier and should be used with caution.
The contents of com.arjuna.ats.jta.xaRecoveryNode
should be alphanumeric and match the values of
com.arjuna.ats.arjuna.xa.nodeIdentifier.

This leads me back to "jbossjta-properties.xml" and lo-and-behold I do see a "com.arjuna.ats.arjuna.xa.nodeIdentifier" property set here - its value is "1". I didn't look deeply into what this actually identifies, but I assume it identifies this JBossTS instance (but I could be wrong on this).

So, following the instructions, I went back to my "jbossjta-properties.xml" and configured it to use this new recoverer instead of the unusable JDBCXARecovery implementation and to use the appropriate value for xaRecoveryNode:

<property
name="com.arjuna.ats.jta.recovery.XAResourceRecoveryJDBC"
value="com.arjuna.ats.internal.jdbc.recovery.AppServerJDBCXARecovery;MyDS"/>
<!-- xaRecoveryNode should match value in nodeIdentifier or be * -->
<property name="com.arjuna.ats.jta.xaRecoveryNode" value="1"/>

I'm really getting close now! There is one slight problem, however. When I run my application server for the very first time, my data source is not deployed yet! My application requires the user to run through a "post-installation" UI in order to do things like tell me what database vendor the user is using (Postgres or Oracle), the JDBC URL, database username, password, etc. My application then writes out some deployment information and hot deploys the ds.xml at runtime (a great feature provided to me by JBossAS - hot deployment of data sources is very cool).

Anyway, this causes problems because before my user runs this "post-install" step, I have no data source deployed and this recovery object will dump an ugly stack trace to the log because it can't find the data source. The exception is an MBeanException with a root cause of "javax.management.InstanceNotFoundException: jboss.jca:name=MyDS,service=ManagedConnectionFactory is not registered." This is to be expected, looking at the code of AppServerJDBCXARecovery.

So what I had to do is modify the AppServerJDBCXARecovery code so it can tolerate the times when the data source is not deployed. (I'll post a follow up with a URL to tell you where you can find this modified code, its not checked into svn yet, but will be soon. It is a pretty simple change [update: the source can now be viewed here]).

At this point, I recompiled my custom version of AppServerJDBCXARecovery, bundled it in a .jar and placed that jar in my JBossAS's server/default/lib directory and restarted the server. At this point, no errors occur at startup, and after deploying my data source, I confirmed that the recovery object is able to obtain my XADataSource!

And that's it, it is that simple. :-) [update: not so fast, after I wrote this blog, I hit another problem that is documented in JIRA JBTM-441. This is bad because a very common recovery use case (the database or network crashes) causes recovery to fail until you restart your app server, and this is true for all currently released JBossAS versions, 4.3 and under as of today, 12/6/2008) You must build a patched version of AppServerJDBCXARecovery, attached to that JIRA, and deploy it yourself to work around the problem]

At this point, I have an application with XA data sources deployed and a transaction manager configured to recover any transactions that fail. I plan on writing some test code in which I can force transaction failures to occur, so I can actually test that the recovery features are fully enabled, but at this point, I have very little doubt that things would work. Once I see that JBossTS is able to get my XADataSource, its just a matter of JBossTS doing what it does best - which includes performing this transaction recovery.

PHEW! All of this investigation took alot of time and energy, way too much time for my liking. Hopefully, I can save a few hours (or days :) of someone else's time with this information. It could have turned my several days into about 30 minutes. :}

Saturday, November 22, 2008

Transaction Timeouts and EJB3/JPA

I recently discovered somewhat odd behavior in the way transactions are timed out in my EJB3/JPA app. It was surprising to me how it worked, and though I guess it makes sense when thinking about it, something here still doesn't "feel right".

Consider an EJB3/JPA application deployed in JBossAS 4.2 that uses the Arjuna's Transaction Manager (aka JBossTM - although I still call it Arjuna :) and Hibernate as the JPA implementation.

Annotating an EJB3 stateless session bean (SLSB) method, is this:

@org.jboss.annotation.ejb.TransactionTimeout(60)

If my method takes longer than 60 seconds, the transaction it is running in will timeout, and thus rollback. But, what happens if my method is executing a very long running SQL update? Or, maybe my method is simply writing out a very long data file to a remote file system. Or, maybe I'm just stupid and my method is executing "Thread.sleep(30 * 60 * 1000)"? In other words, what happens if my method takes longer than this transaction timeout?

What I was assuming would happen is the thread running my method would get interrupted, and (assuming the JDBC driver or file IO subsystem or Thread.sleep or whatever my method was doing at the time can handle the interrupt) the method would abort with that interrupted exception and immediately begin the rollback procedures. It would do this because once the method exits, the EJB3 interceptor chain would begin to unroll and eventually hit the transaction manager interceptors whose job it is to abort the transaction.

That is close to what happens, but not exactly.

It turns out, the transaction manager detects that the transaction has timed out and aborts it from within an asynchronous thread (different from the thread running my method), but it never sends an interrupt to my method. So, my method continues on even after the timeout period and only when it returns will that interceptor chain finally get a chance to abort the transaction. But by this time, its already been aborted! The interceptor kindly tells me this in a log message, but it's late to the party - the transaction has already been rolled back. This doesn't actually cause any harm because the transaction manager simply says, "this has already been aborted and the rollback was performed earlier, I'll just log a warning and skip the abort procedure".

But, I question why my method was not given a chance to abort also. It is possible that my method could run for 60 minutes, do everything completely successfully, yet, because it exceeded the transaction timeout the entire transaction was rolled back and that 60 minutes worth of work now becomes wasted.

Of course, the answer to that would be, "just set your transaction timeout to a higher value". But that's not really my point. My point is, why is my thread even allowed to waste its time when the transaction manager knows the timeout has expired and the transaction has been rolled back? I think it should at least attempt to warn the thread about the situation by sending an interrupt to it (at that point, it would be my method's job to handle the thrown InterruptedException).

I've tested this to see if an interrupt is sent and I don't see it. See below for the log messages I see after running some test code. I had an SLSB method annotated with a transaction timeout of 2 seconds, and in my SLSB method, I block within a "Thread.sleep(10000)" call (so it pauses in my method for 10 seconds). Take note of the log timestamps and notice that my sleep is allowed to return normally after 10 seconds, but after the 2nd second, you see the transaction manager aborted my transaction! Therefore, my method was allowed to continue until it returned normally - at which time, the interceptor chain tried to abort the transaction a second time:


2008-11-23 01:09:33 INFO  [STDOUT] !!!!!!!!!BEFORE SLEEP
2008-11-23 01:09:35 WARN  [com.arjuna.ats.arjuna.logging.arjLoggerI18N]
[com.arjuna.ats.arjuna.coordinator.BasicAction_58] - Abort of action
id a0b0c21:ab6:4928f32d:1225 invoked while multiple threads active within it.
2008-11-23 01:09:35 WARN  [com.arjuna.ats.arjuna.logging.arjLoggerI18N]
[com.arjuna.ats.arjuna.coordinator.CheckedAction_2] - CheckedAction::check
- atomic action a0b0c21:ab6:4928f32d:1225 aborting with 1 threads active!
2008-11-23 01:09:43 INFO  [STDOUT] !!!!!!!!!AFTER SLEEP
2008-11-23 01:09:43 WARN  [com.arjuna.ats.arjuna.logging.arjLoggerI18N]
[com.arjuna.ats.arjuna.coordinator.BasicAction_40] - Abort called on
already aborted atomic action a0b0c21:ab6:4928f32d:1225
2008-11-23 01:09:43 ERROR [test.slsb] Failed. Cause:
java.lang.IllegalStateException: [com.arjuna.ats.internal.jta.transaction.arjunacore.inactive]
[com.arjuna.ats.internal.jta.transaction.arjunacore.inactive] The transaction is not active!
java.lang.IllegalStateException: [com.arjuna.ats.internal.jta.transaction.arjunacore.inactive]
[com.arjuna.ats.internal.jta.transaction.arjunacore.inactive] The transaction is not active!
at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1372)
at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:135)
at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:87)
at org.jboss.aspects.tx.TxPolicy.endTransaction(TxPolicy.java:175)
at org.jboss.aspects.tx.TxPolicy.invokeInOurTx(TxPolicy.java:87)
at org.jboss.aspects.tx.TxInterceptor$RequiresNew.invoke(TxInterceptor.java:262)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
at org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:76)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
at org.jboss.ejb3.stateless.StatelessInstanceInterceptor.invoke(StatelessInstanceInterceptor.java:62)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
at org.jboss.aspects.security.AuthenticationInterceptor.invoke(AuthenticationInterceptor.java:77)
at org.jboss.ejb3.security.Ejb3AuthenticationInterceptor.invoke(Ejb3AuthenticationInterceptor.java:106)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
at org.jboss.ejb3.ENCPropagationInterceptor.invoke(ENCPropagationInterceptor.java:46)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
at org.jboss.ejb3.asynchronous.AsynchronousInterceptor.invoke(AsynchronousInterceptor.java:106)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:214)
at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:184)
at org.jboss.ejb3.stateless.StatelessLocalProxy.invoke(StatelessLocalProxy.java:81)
     ...

It is also interesting to note that the transaction manager knows that a thread is still running my method - see its log message: "aborting with 1 threads active".

This is worrisome to me because I could run into a condition where my SLSB methods might waste a whole lot of time processing a query or doing other work well after my timeout has triggered the rollback (unless I set my timeouts to some really high value that I know will never expire but then, what's the point of the timeout?). I would rather have my method be interrupted so I don't waste the system resources doing work that is just going to be (or has already been) rolled back. But, for now, without any other options that I know of, I am just going to have to make sure I set my timeouts to some very large values for those methods that may (but most times may not) take a really long time to complete.

I wonder if there is some configuration property I can set on the transaction manager to tell it, "interrupt any active thread that may be running within a transaction that has timed out". I don't know of any such thing, but it just seems like this would be an obvious thing to want to do, so I would not be surprised at all if it turns out that I am just missing a piece of the puzzle. Feel free to comment on this blog if you know of a way to work around this issue or to let me know if there is something I am misunderstanding with respect to the way the transaction manager works (or can be configured to work).

As a side note, I was hoping I could prevent doing the extra, wasted work in the database by circumventing the JPA entity manager and using JDBC statements directly, thereby having access to the java.sql.Statement.setQueryTimeout API but, alas, the Postgres driver does not yet implement that.

Friday, November 21, 2008

Mobicents Platform Integrated with Jopr!

The Mobicents Platform has recently announced their integration with Jopr!

Jean has some really nice things to say:

http://jeanderuelle.blogspot.com/2008/11/mobicents-sip-servlets-gets-shinny-new.html

This validates what we are doing - which is developing an extensible management platform that anyone who has a software product that they want to manage/monitor can integrate with (and do so quickly and easily).

Monday, November 10, 2008

Monitoring Custom JMX MBeans With Jopr

My previous post explains how you can use the Hibernate plugin to manage your Hibernate applications with Jopr. In reality, the Hibernate plugin is just a customized JMX plugin because, after all, everything you do in relation to monitoring Hibernate is through its Hibernate Statistics MBean.

So this begs the question, what if my own application has its own custom JMX MBeans? Can I use Jopr to manage and monitor those as well?

The answer is a most emphatic yes!.

Out-of-box, you already get the generic JMX plugin. This was designed to be extensible by other plugins. In fact, its where Jopr's JBossAS plugin, Tomcat plugin and Hibernate plugin all get alot of their functionality from! It stands to reason that you can do the exact same thing for your own MBeans as the Jopr development team does for JBossAS, Tomcat and Hibernate. There is even already an example custom JMX plugin Maven module that you can use as a starting point, should you want to write your own custom JMX plugin - see the actual code here which you can copy and customize for your needs. BTW: when I say "custom JMX plugin" - I mean a plugin that can manage your custom JMX MBeans. I do not mean to say this is a customized version of the out-of-box JMX plugin. In other words, the custom JMX plugin is not a replacement for the existing JMX plugin, its merely an extension to it.

Watch the demo I made to see how you can create your own custom JMX plugin for deployment in any RHQ or Jopr Server (btw: RHQ is where the entire plugin framework lives; because it is upstream to Jopr, your RHQ plugins are completely 100% compatible with Jopr as well - I'll continue to use the Jopr name, but suffice it to say, whenever I say "Jopr", you can assume the same holds true for RHQ). The source and binaries used by the demo are available for download, in case you want to try it yourself after watching the demo.

The generic JMX plugin is very flexible in the way you can extend it. That said, there is still plenty more we can probably do to enhance it. Allowing it to generically handle more data types for operation parameters and results and attribute values are just some of the low-hanging fruit in here - if only we had 25 hours in a day. If anyone is interested in getting started contributing to the RHQ project, that would probably be a very good place to start - its easy to understand and the places that need to be enhanced are localized to only one or maybe a couple Java classes. If you are willing to try this out, feel free to ask about this in our freenode chat room at #rhq or send a message to one of our forums.

Sunday, November 9, 2008

Monitoring Hibernate With Jopr

I have heard several people ask questions regarding how they can manage and monitor Hibernate from Jopr. Some have even asked how they can do this even if their Hibernate app is not running inside a JBossAS server instance (for example, within its own, standalone J2SE virtual machine).

Jopr today has the capability to manage/monitor Hibernate if its running inside Tomcat or JBossAS via its Hibernate plugin. It can even monitor more than one Hibernate application/JVM that is running on your machine, even if they are using different versions of Hibernate!

Today, I checked in code to the RHQ core and Jopr Hibernate plugin to have it also support Hibernate that is running in a standalone JVM (in fact, it can now support Hibernate running in any JVM, so long as it can remotely connect to the Hibernate Statistic MBean's MBeanServer).

This is such a cool feature, that I decided to "wink" it. Watch the flash demo to see how you can examine your Hibernate statistics in the Jopr GUI - things such as which queries were executed, how many times they were executed and how long it took to execute them; how many and which entities were created and deleted, etc. This is a very helpful set of features for developers - I can attest to that because I use this to examine the RHQ Server's own Hibernate usage.

There are a couple caveats:

First, the JVM that Hibernate is running in must have JMX remoting enabled and configured to accept connections from the agent. Google "com.sun.management.jmxremote" and read all about the settings used to configure this. The demo used something like this:


java -Dcom.sun.management.jmxremote.port=19988 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false \
-jar helloworld.jar

Second, your application must have enabled the Hibernate Statistics MBean. This sounds obvious, but I guess I should explicitly mention it. If you don't tell Hibernate to turn on its statistics, you can't very well get any useful data from it. To do this, your application will have to execute something like this:

StatisticsService mBean = new StatisticsService();
SessionFactory sessionFactory = ...get hibernate session factory...
mBean.setSessionFactory(sessionFactory);
ObjectName objectName = new ObjectName("Hibernate:application=MY_APP_NAME,type=statistics");
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
mbs.registerMBean(mBean, objectName);
sessionFactory.getStatistics().setStatisticsEnabled(true);

Once you place your statistics MBean in an MBeanServer that can be remotely accessed, your Jopr Hibernate plugin will do the rest.

Monday, November 3, 2008

Agent Auto Update

Now that we have introduced HA capabilities to RHQ, we expect to be able to support at least a few hundred agents within an RHQ ecosystem (if not more).

But this poses a new challenge for us and our customers - how do we keep all of those agents up-to-date? Once you get an RHQ environment set up and running, what happens when we release our next version of RHQ? We want to avoid having to remotely log onto each and every agent machine to manually update their old agents.

I've recently embarked on the first implementation of an "agent auto-update" feature, to be included in the next release of RHQ.

Soon, RHQ will no longer ship a separate agent distribution. We will now ship just a "RHQ distribution" that includes both a server and agent (this will be true of Jopr and JBoss ON as well). This will ensure that when you get a distribution binary, you will receive the server and agent that are compatible with each other.

The agent distribution is then directly downloadable straight from the server (e.g. http://<server>

:7080/agentupdate/download). This lets you grab an agent distribution when you want to install an agent manually.

However, what happens if you already installed the agent? Soon, what will happen, is the agent will be able to automatically download an updated agent distribution from the server and install the update itself; all without user intervention or manual steps needing to be taken.

To follow the progress of this feature or are interested in the technical details, watch the JIRA and feel free to join in the discussion on that developer forum thread.

http://jira.rhq-project.org/browse/RHQ-110
http://www.rhq-project.org/display/RHQ/Design-AgentAutoUpdate

Management Platform

Due to enormous peer pressure from my current team members, I have embarked on the creation of my first blog. :)

I am currently involved in the design and development of the RHQ management platform and the Jopr Middleware Management project. The blogs that I will be writing in the future will provide some additional insight regarding the RHQ and Jopr projects.

If your company has a need for a JBoss Middleware Management product, may I suggest you look into JBoss Operations Network - which is Red Hat's offering that, at its core, is Jopr/RHQ.

Thoughts From A Management Platform Developer