The Java EE Verifier and indirect and optional dependencies

3 08 2009

Running the Java EE 5 Verifier can be a useful way of checking EAR files and other Java EE artifacts before deploying and running them.

However, once you start using third-party libraries there’s one set of rules in the verifier that are rather too idealistic: the requirement that all referenced classes need to be present in the application. If any classes are referenced but can’t be found, these are reported by the verifier as failures.

In theory, it’s perfectly reasonable that Java EE applications are basically supposed to be “self-contained”, and that all classes referenced within them need to be present within the application itself (obviously excluding those of the Java EE environment itself). Actually, Java’s “extension” mechanism is also supported as a way of using jars from outside of the application, but this has limitations and drawbacks of its own and doesn’t really change the overall picture. There’s a useful overview of this subject in the “Sun Developer Network” article Packaging Utility Classes or Library JAR Files in a Portable J2EE Application (this dates from J2EE 1.4, but is still broadly appropriate for Java EE 5).

Anyway, verifying that the application’s deliverable includes all referenced classes seems better than risking sudden “class not found” errors at run-time (possibly on a “live” system and possibly only in very specific situations). The trouble is that once you start using third-party libraries, you then also need to satisfy their own dependencies on further libraries, even where these are only needed by optional facilities that you never actually use. Then you also need all the libraries that those libraries reference, and so on. This can easily get out of hand, and require all sorts of libraries that aren’t ever actually used by your application.

As a simple example, take the UrlRewriteFilter library for rewriting URLs within Java EE web-applications. This is limited in scope and its normal use only involves a single jar, so you’d think it would be relatively self-contained.

However, one of its features is that you can configure its “logging” facilities to use any of a number of different logging APIs. In practice, I don’t use anything other than the default setting, which uses the normal servlet-context log. But its code includes references to log4j, commons-logging and SLF4J so that it can offer these as options. The documentation says that you need the relevant jar in your classpath if you’re using one of these APIs, but the Java EE Verifier tells you that they all need to be present – even if you’re not actually using them (on the perfectly reasonable basis that there’s code present that can call them).

That’s not the end of the story. The SLF4J API in turn uses “implementation” jars to talk to actual logging facilities, and includes references to classes that are only present in such implementation jars. So you also need at least one such SLF4J implementation jar. At this point you’re now looking at the SLF4J website and trying to figure out which of its many jars you need. What are they all? Does it matter which one you pick? Perhaps you need all of them? Do they have any further dependencies on yet more jars? Are there any configuration requirements? Are these safe to include in your application without learning more about SLF4J? Do they introduce any security risks?

So apart from anything else, you’re now having to find out more than you ever wanted to know about SLF4J, just because a third-party library you’re using has chosen to include it as an option. Ironically, a mechanism intended to give you a choice between several logging APIs has ended up requiring you to bundle all of them, even when you’re not actually using any of them!

Anyway, in addition to the log4j jar, the commons-logging jar, the SLF4J API jar, and an SLF4J implementation jar, the UrlRewriteFilter also needs a commons-httpclient jar (though again, nothing in my own particular use of UrlRewriteFilter appears to actually use this). That in turn also requires a commons-codec jar.

Fortunately, that’s the limit of it for UrlRewriteFilter. But it’s easy to see how a third-party jar could have a whole chain of dependencies due to “optional” facilities that you’re not actually using.

As a rather different example, another library that I’ve used recently appears to have an optional feature that allows the use of Python scripts for something or other. This is an optional feature in one particular corner of the library, and is something I have no need for. To support this feature, the code includes references to what I presume are Jython classes. As a result the verifier requires Jython to be present (and then presumably any other libraries that Jython might depend on in turn). Now, bundling Jython into my Java EE application just to satisfy the verifier and avoid a purely-theoretical risk of a run-time “class not found” error seems plain crazy. If the code ever does unexpectedly try to use Jython, I’d much rather have it fail with a run-time exception than have it work successfully and silently do who-knows-what. To add insult to injury, Jython is presumably able to call Python libraries that might or might not be present but that the verifier will know nothing about – so bundling Jython in order to satisfy the verifier might actually make the application more vulnerable to code not being found at run-time.

With the mass of third-party libraries available these days, and the variety of dependencies these sometimes have, I suspect there must be cases that are far, far worse than this. (Anyone out there willing to put forward a “worst case”?)

So what’s the answer? Obviously you do need to bundle the jars for all classes that are actually used, but for jars whose classes are referenced but never actually used (and any further jars that they reference in turn) I can see a number of alternatives:

  • Work through all the dependencies and bundle all the jars so that the verifier is happy with everything. Often this is entirely appropriate or at least acceptable, but as we’ve seen above, this cure isn’t always very practical, and in some cases it can be worse than the disease.
  • A variation on the above is to leave the “unnecessary” jars out of the application but run the verifier on an adjusted copy of the application that does include them. That is, produce a “real” deliverable with just the jars that are actually needed, and a separate adjusted copy of it that also includes any other jars necessary to keep the verifier happy but that you know aren’t actually needed by the application. The verification is run on this adjusted copy, which is then discarded. The drawback is that you still have to work through the entire chain of dependencies and track down and get hold of all of the jars, even for those that aren’t really needed. There’s also the risk that you’ll treat a jar as unnecessary when it isn’t, which is exactly the mistake that the verifier is trying to protect you from.
  • Another alternative is to just give up and not use the verifier. But it seems a shame to miss out on the other verification rules just because one particular rule isn’t always practical.
  • Ideally, it’d be nice to be able to configure the verifier to allow particular exceptions (perhaps to specify that this particular rule should be ignored, or maybe to specify an application-specific list of packages or classes whose absence should be tolerated). But as far as I can see there’s no way to do this at present.
  • Another approach is to inspect the verifier’s results manually so that you can ignore these failures where you want to, but can still see any other problems reported by the verifier. However, it’s always cumbersome and error-prone to have to manually check things after each build, especially where you might have to wade through a long list of “acceptable” errors in order to pick out any unexpected problems.
  • Potentially you could script something to examine the verifier output, pick which warnings and failures should and shouldn’t be ignored, and produce a filtered report and overall outcome based on just the failures you’re interested in. In the absence of suitable options built into the verifier, you could use this approach to support appropriate options yourself. This is probably the most flexible approach (in that you could also use it for any other types of verifier-reported errors that you want to ignore). But it seems like more work than this deserves, and it’d be rather fragile if the messages produced by the verifier ever change.
  • As a last resort, if the library containing the troublesome reference is open-source you could always try building your own customised version with the dependency removed (e.g. find and remove the relevant “import” statements and replace any use of the relevant classes with a suitable run-time exception). Clearly, even where this is possible it will usually be more trouble than it’s worth and will usually be a bad idea, but it’s another option to keep up your sleeve for extreme cases (e.g. to remove a dependency on an unnecessary jar that you can no longer obtain).

The approach I’ve adopted for the time being is to run the verifier on “adjusted” copies of my applications, but only use this for jars that I’m very confident aren’t needed and aren’t wanted in the “real” application. The actual handling of this is built into my standard build script, which builds the “adjusted” application based on an application-specific list of which extra jars need to be added into it.

In the longer term, I’m hoping that the entire approach to this might all change anyway… in a world of dynamic languages, OSGi bundles, and whatever eventually comes of Project JigSaw and other such “modularization” efforts, the existing Java EE rules and packaging mechanisms just don’t seem very appropriate anymore. It all feels like part of the mess that has grown up around packaging, jar dependencies, classpaths, “extension” jars etc, together with the various quirks and work-arounds that have found their way into individual specfications, APIs and tools (often to handle corner-cases and real-world practicalities that weren’t obvious when the relevant specification was first written).

So I’m hoping that at some point we’ll have a cleaner and more general solution to packaging and modularization, and this little quirk and all the complications around it will simply go away.

Advertisements

Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: