Beware of using java.util.Scanner with “/z”

17 12 2011

There are various articles and blog postings around that suggest that using Scanner with a “/z” delimiter is an easy way to read an entire file in one go (with “/z” being the regular expression for “end of input”).

Some examples are:

Because a single read with “/z” as the delimiter should read everything until “end of input”, it’s tempting to just do a single read and leave it at that, as the examples listed above all do.

In most cases that’s OK, but I’ve found at least one situation where reading to “end of input” doesn’t read the entire input – when the input is a SequenceInputStream, each of the constituent InputStreams appears to give a separate “end of input” of its own. As a result, if you do a single read with a delimiter of “/z” it returns the content of the first of the SequenceInputStream’s constituent streams, but doesn’t read into the rest of the constituent streams.

At any rate, that what I get on Oracle JDK 5, 6 and 7.

This might be a quirk or bug in Scanner, SequenceInputStream, regular expression processing, or how “end of input” is detected, or it might be some subtlety in the meaning of “/z” that I’m not privy to. Equally, there might be other types of InputStream with constituent sub-components that each report a separate “end of input”. But whatever the underlying reasons and scope of this problem, it seems safest to never assume that a single read delimited by “/z” will always read the whole of an input stream.

So if you really want to use Scanner to read the whole of something, I’d recommend that even when using “/z” you should still iterate the read until the Scanner reports “hasNext” as false (even though that rather reduces the attraction of using Scanner for this, as opposed to some other more direct approach to reading through the whole of the input).

Java Enum as Singleton: Good or Bad?

4 07 2011

Item 3 in the 2nd Edition of Effective Java explains three ways of implementing a singleton in Java, the last of which is “Enum as Singleton”. This uses an Enum with a single element as a simple and safe way to provide a singleton. It’s stated as being the best way to implement a singleton (at least, for Java 5 onwards and where the additional flexibility of the “static factory method” approach isn’t required).

But is this technique a good or bad idea? Is anyone actually doing it this way? If you’ve used it or encountered it, are you happy with it or do you have any reservations?

Please note: I’m not interested in any wars over whether singletons are evil or not. The concept exists, one comes across them in real code, and there are reasonable discussions to be had over whether they are always a bad idea or have their uses in particular situations. None of that is relevant to how best to implement a singleton if one ever does wish to do so, or the pros and cons of different implementation techniques.

OK, with that dispensed with, what should we make of the “Enum as Singleton” technique?

From my point of view, it works, the code is trivially simple, and it does automatically take care of the “serialization” issue (that is, maintaining one instance per classloader even in the face of serialization and deserialization of the instance). But it feels too much like a trick, and (arguably) not in the spirit of the concept of an enumeration type. When I see an Enum that isn’t being used to enumerate a set of constants and that only has one element, I think I’m more likely to have to stop and figure out what’s going on rather than immediately and automatically thinking “oh, here’s a singleton”. If it becomes more common I’ll no doubt get used to seeing this idiom, but if so I might then find myself misled by any “normal” enumeration that just happens to only have one element.

Another concern is that whilst the use of a static factory method to provide a singleton offers more flexibility than either the use of a public static member or a single-element Enum, it requires different client code for accessing the singleton. So using either of the latter two approaches means that you risk having to change client code if you ever need to “upgrade” the singleton to the more flexible “static factory method” approach.

A further issue is how best to name Enum classes and instances that are implementing singletons. Should one stick to the usual naming conventions for Enums, or adopt some other naming convention (and maybe include “Singleton” in the name to make the intent clear)? And what if the singleton object is mutable in any way? Or is that a more general issue over the naming of enumeration “constants” if they are actually mutable? Or maybe it makes more sense to say that Enums must be genuine constants and should never, ever be mutable – in which case “Enum as Singleton” shouldn’t be used for any singleton with mutable state, which limits its applicability even more?

So now that the “Enum as Singleton” technique has been widely known for a few years, does anyone have any significant experiences from real-world use of it? Or any other opinions on this technique?

Third-time lucky with EJB

30 12 2010

I learnt EJB 1, but never encountered any situation that justified actually using it.

I learnt EJB 2, but never encountered any situation that justified actually using it.

So despite knowing that EJB 3 was much better, and having a general picture of it, I’d been holding off from any detailed reading or study on EJB 3 until specifically needing it.

Well, now I have a development for which EJB 3 seems appropriate, so this time I’m finally using it for real!

In practice this really means JPA with a tiny bit of EJB on top, as “Entities” aren’t technically EJBs any more (to the extent that if an EJB jar consists of nothing but “entities” it’s considered invalid due to not containing any EJBs).

On the whole I’ve been quite impressed by EJB 3 and JPA. The basic programming is, as advertised, much cleaner and simpler than before, and it lives up to its reputation of being much easier to get started with and needing far less “boilerplate” code.

Inevitably it’s taken a fair amount of work to arrive at suitable design choices, naming and coding conventions, build script enhancements, test facilities etc, and in general to address all the myriad of assorted issues and choices that crop up whenever one adopts any additional technology. And, of course, once you get into anything non-trivial you’re dragged into the usual pile of quirks and work-arounds and implementation-specific bugs. You only need to take a quick look though the JPA FAQ and read some of the referenced discussions to get a feel for how tricky and arbitrary some of this can be.

But overall, incorporating EJB 3 and JPA into a couple of new projects has all been relativately straightforward, and certainly no worse than is par for the course these days.

I still wish I could somehow claw back the time I’d previously spent learning EJB 1 and 2 (though not nearly as much as the time I spent learning Microsoft COM technologies!). But what’s done is done, and so far I’m very pleased with how easy and effective EJB 3 and JPA now appear to be.

If you’re still holding back from EJB and JPA because of bad experiences with previous versions, I can add my voice to the chorus of people saying it’s worth a fresh look.

Servlet 3.0 – A spaghetti API?

26 07 2010

The introduction in Servlet 3.0 of “web fragments” and both annotation-based and programmatic mechanisms for introducing components into a web-application are all very welcome.

However, combined with all the other new features, their configuration facilities, the relevant class/jar-finding mechanisms, and the interactions between everything, the overall complexity of the Servlet API seems to have increased horrendously.

To my mind, an awful lot of it is starting to look like a tangled mess of spaghetti – the API equivalent of spaghetti code.

Here’s just one relatively minor example (but please, please, please put me straight if I’ve missed the meaning of this and it’s all really simple and elegant).

Here goes…

The Javadoc of every “since 3.0” method in javax.servlet.ServletContext (for example, getEffectiveMajorVersion) includes a “throws” clause that says:

Throws: java.lang.UnsupportedOperationException – if this ServletContext was passed to the ServletContextListener#contextInitialized method of a ServletContextListener that was neither declared in web.xml or web-fragment.xml, nor annotated with WebListener

So the behaviour of a ServletContext, including things like whether or not you can determine which Servlet version it needs, thus depends on whether it “was passed to” a ServletContextListener to notify that listener of the context’s initialization – depending on which of various ways were used to create the listener.

For now let’s just gloss over the various minor questions and issues raised by this, such as:

  • What does “was passed to” actually mean? Has been passed to, at any time previously? Is currently being processed within a call to? Both? Something else?
  • Does or doesn’t this apply if the ServletContext “was passed to” multiple listeners of which some are of the specified type and some are not?
  • What is the actual purpose of this rule (i.e. why should being passed to a particular type of listener prevent the ServletContext from processing any of its “since 3.0” methods)?

Quite apart from all that, and far more fundamentally, isn’t it rather perverse for an object’s methods to depend directly on what other objects it “was passed to”? Especially where there doesn’t seem to be any immediately obvious reason for such a dependency?

And doesn’t it seem even more wrong that an object’s behaviour should depend on which other objects are “listening” for events on it? Isn’t that the tail wagging the dog?

Even assuming there’s some reasonable reason for this, and that there’s some sense in which it makes some kind of sense, is this really the kind of thing we want to see in an API?

Just in case this still seems too simple for you, the ServletContext also now includes a createListener method for creating listeners, and a number of overloaded addListener methods for adding listeners to itself (but only provided it has not already been initialized). The method for creating listeners does allow the creation of ServletContextListeners, but the methods for adding listeners only supports the addition of a ServletContextListener “If this ServletContext was passed to ServletContainerInitializer#onStartup” (which I’ll come to later).

Now both of these methods are subject to various conditions, including the “throws” clause described above. Listeners created and added in this way are, presumably, precisely the sort of listeners that such “throws” clauses are referring to (that is, not defined in web.xml or web-fragment.xml and potentially not annotated with WebListener). But what does it mean for methods that create and add such listeners to also have this “throws” clause themselves? Especially when they also require the ServletContext to have not yet have been initialized, in which case it presumably can’t have been passed to any ServletContextListeners yet anyway?

Is anyone else getting confused yet?

If even that still seems simple enough, ServletContextListeners are also no longer the only things listening for the application and/or context’s initialization. There is also now a ServletContainerInitializer interface, for classes that want to handle the application’s start-up (or does it really mean the container’s start-up, as its name would seem to imply?). Clearly, this is another route through which ServletContextListeners can be programmatically created and introduced, in particular by having the ServletContainerInitializer use the ServletContext’s “createListener” and/or “addListener” methods – with the “addListener” methods making specific allowance for this as described above, and requiring the ServletContext to know whether or not it “was passed to” a ServletContainerInitializer.

Of course, this ServletContainerInitializer interface has its own complexities and quirks. I won’t go into full detail on these here, but just to give a flavour:

  • It specifies naming conventions and mechanisms for how its implementing classes are found (and these mechanisms have their own quirks and ambiguities, for example the naming convention appears to require classes to be placed in the “javax.servlet” package, in violation of the usual rues and licence terms, and the class-level javadoc says that implementations must be within the application’s /WEB-INF/services directory but the relevant method’s javadoc talks about different behaviour depending on whether it is within /WEB-INF/lib or elsewhere);
  • It uses an annotation to specify what types of classes are to be passed to its sole method as arguments, together with rules for how the relevant classes are to be found, with this in turn including a requirement for the container to provide “a configuration option” to control whether failure to find such classes should be logged;
  • Its javadoc includes the quite wonderful statement “In either case, ServletContainerInitializer services from web fragment JAR files excluded from an absolute ordering must be ignored, and the order in which these services are discovered must follow the application’s classloading delegation model.”.

Am I alone in thinking this is all getting way out of hand? How many features like these (with their accompanying restrictions, exclusions and interactions) does it take before the API as a whole becomes incomprehensible?

At this point I was going to sarcastically sugguest some incredibly complex and convoluted fictional requirement for things I’d like to see added into the next version of the API. But I’m too afraid that someone might treat it as a serious feature request, and in any case it’s not easy to come up with anything that’s more convoluted than the existing features (at least, not without sounding completely silly).

So instead I’ll just say that, personally, I fear that the Servlet API may have already jumped the shark.

Why would you ask for zero bytes from a Java InputStream?

12 04 2010

When would one pass a length argument of zero to[], int, int) so as to not read any bytes? Does anyone have a good example of when this is necessary or convenient?

The method’s javadoc shows that it explicitly caters for being passed a length of zero, but to me that looks like an unnecessary complication that has plenty of potential for misunderstanding, incorrect implementation by subclasses, and a risk of infinite loops in client code.

I’ve been trying to imagine what common situation might justify catering for a request to read zero bytes, but haven’t come up with anything convincing.

The actual wording of the Javadoc is “If len is zero, then no bytes are read and 0 is returned; otherwise…” and then goes on to explain it’s normal processing within the “otherwise…” clause, including the handling of end-of-stream and any IOExceptions that might occur. There are some separate paragraphs before and after this, and separate explanations of argument validation, but it seems quite clear that if the length is zero the “if len is zero” statement applies instead of the normal processing and its various conditions and outcomes.

At first glance that seems straightforward and simplifies things – the remainder of the rules only apply for non-zero lengths.

However, it’s not as simple as it seems:

  • If you’re already at end-of-stream, reading zero bytes will complete as normal, won’t change anything, and will return zero. It’s easy to see how a caller could get stuck in an infinite loop if they’re not explicitly checking for this. (Conversely, if the caller is explicitly checking for a result of zero, it wouldn’t appear to be any harder for the caller to instead check for a length of zero beforehand and avoid the call altogether). It also means you can’t use a read of zero bytes as a safe way of just checking whether you’ve reached at end-of-stream yet.
  • The javadoc says, quite separately, that an IOException is thrown if the stream is already closed. It isn’t clear which condition takes precedence if zero bytes are requested but the stream is also already closed. More generally it’s not clear whether this is specifying that an IOException SHOULD be thrown if the stream is already closed or just explaining that this MAY result in an IOException (i.e. if an attempt to actually use the stream happens to result in such an exception). So depending on how you read it, you can argue either that an attempt to read zero bytes when the stream is already closed should complete normally and return zero, or that it should throw an IOException.
  • It’s invalid to specify an offset and length that together exceed the size of the destination array (such that writing the bytes into the array would go out-of-bounds). This appears to apply even if the length is zero, and that is indeed how it’s implemented in the source code (at least, in the Sun JDK 6 source code). But this is somewhat inconsistent with the general treatment of a zero length as returning zero regardless of other issues (e.g. even if already at end-of-stream). Arguably it would be more appropriate and more consistent with the rest of the specification to completely ignore the offset and array arguments if the length is zero and you’re not actually going to read any bytes into the array.
  • If a call successfully reads one or more bytes but then encounters an exception, the read ends at that point and returns normally, with the exception then being thrown for the first byte of the next read. But if the next read is for zero bytes, it will complete successfully without even attempting a read, and won’t encounter the exception. Whilst that’s in keeping with the normal behaviour of the method, it’s yet another thing that callers asking for zero bytes might need to be aware of and cater for (depending on exactly what they’re doing and how the read of zero bytes arises).
  • InputStream implementations aren’t entirely consistent with this specification, even within the JDK. In particular, the Javadoc for says that it tests for end of stream and returns -1 prior to considering whether to read any bytes or return zero. Hence if a ByteArrayInputStream is at end of stream and you ask to read zero bytes, it gives you -1 to indicate end-of-stream rather than zero as specified by the underlying InputStream base class. With the various ambiguities noted above, third-party InputStream implementations of this method are probably even more likely to be inconsistent in how they handle reads of zero bytes.

So why isn’t a length argument of zero just prohibited? As far as I can see, the typical use of this method shouldn’t normally involve passing a length of zero, and any client code that really can result in a legitimate call for a length of zero is probably going to have to do something to explicitly handle it anyway (for example, to avoid getting stuck in a loop). The length is already required to be non-negative, so why isn’t it just required to be greater than zero instead? That would seem to be a lot simpler and less open to misinterpretation, misuse or incorrect implementation.

What am I missing? Can anyone enlighten me with a good example of something that benefits from being able to ask for zero bytes? That is, a relatively common use of[], int, int) where passing a “len” argument of zero can actually occur, and where allowing this is significantly more convenient for callers than requiring the caller to explicitly check for and handle this case itself.

Please note that I’m not for a moment suggesting that something this well-established could realistically be changed at this point. I’m just curious as to why it is the way it is. It it a mistake? A lack of attention to detail that we’re now stuck with? Or is there a real good reason for it that I just haven’t come across yet?

ObMimic: Very, very late – but still alive

21 11 2009

Updated Feb 2018: OpenBrace Limited has closed down, and its ObMimic product is no longer supported.

It’s been a long, long time since I last mentioned ObMimic, so an update seems necessary.

At the time I thought it wouldn’t take more than a couple of months or so to get everything ready and put up a public website, discussion forums, bug list, and everything we need internally to release and support ObMimic.

Well, here we are, well over a year later, and there’s still no public website or ObMimic release.

So it seems worth a quick post to explain that ObMimic isn’t dead; it isn’t forgotten; it has just been much delayed and postponed, and is still “on the way”.

Why this long delay?

Well, partly this is the usual story that everything takes much longer than expected, even after you’ve allowed for the fact that everything takes much longer than expected.

Partly it’s from needing longer than expected to handle all the bugs and kludges encountered in the many third-party applications, tools and services involved, and getting everything to work nicely together.

Partly it’s a deliberate decision to wait until everything is reasonably complete and reliable before going public, even if that causes delay, rather than rushing out a quick-and-nasty website with lots of problems (and leaving us with a backlog of work and ongoing firefighting).

But mainly it’s due to having been sucked into other commitments, and spending time on several other things in order to “keep afloat” in the meantime. Such distractions have always seemed like a bad idea, but over the last year or so they’ve been a necessary evil.

It’s not that we’ve not been working on it. It’s just that there are always so many things that need doing, and the months fly by at alarming speed. As Brooks says in The Mythical Man Month: “How does a project get to be a year late? … One day at a time.” (but as also quoted there, “Good cooking takes time. If you are made to wait, it is to serve you better, and to please you”).

Anyway, during all these delays I’ve been deliberately keeping a low profile, and have wanted to steer clear of talking much about ObMimic until I can be sure that it’s genuinely imminent.

Rest assured though, a public beta and full release are still intended as soon as it’s ready (and we are getting there, despite the absence of any outward signs of progress); the product itself is still being maintained and improved in the meantime; and for anyone that wants an early look at it, a “private beta release” is still available if you contact me.

ArbitraryObject: A useful useless class

23 10 2009

I’ve recently introduced a trivial little class called “ArbitraryObject” into my Java test-case code. Here’s the full story…

When writing test cases in Java, every now and then one comes across a situation where an object is needed but its precise type doesn’t matter, and you just need to pick some arbitrary class to use as an example.

Sometimes any class at all will do; sometimes there are constraints on what the class must not be, but anything else will do (e.g. anything that isn’t class X or one of its subclasses or superclasses).

Most commonly this happens for tests of methods that take “Object” arguments – an obvious example is testing that an implementation of “equals(Object)” returns false for any argument that isn’t of the appropriate type.

Another common case is testing of generic classes and methods with “any class” generic parameters, where one needs to pick a class to be used as the generic parameter’s actual type.

In these situations, what class should one use?

Perhaps the simplest choices are Object or String. However, Object seems a poor choice for this in general – if you’re testing something that takes any Object, you probably want to test it with something more specific than Object itself (even if you do also want to test it with a basic Object). It’s also not going to work where you need something that isn’t inheritance-related to some particular class.

Similarly, although String can be very convenient for this, strings are so common as argument values and in test-case code that their use tends to blend into the background. So it’s hard to see when a string is being used for this purpose as opposed to being a “real” use of a string.

More generally, if you’re trying to show how some code handles any arbitrary type, neither Object nor String seem the most useful or convincing examples to pick.

What we’re really looking for is a class that meets the following criteria:

  • It shouldn’t be relevant in any way to the class being tested (isn’t the class being tested, doesn’t appear as a method argument anywhere in the class being tested, and isn’t a superclass or subclass of such types);
  • It shouldn’t be used otherwise in the test-case code (so as to avoid any confusion);
  • Ideally it ought to be somewhat out-of-the-ordinary (so that we can reasonably assume that the code being tested doesn’t give it any special treatment, and so that its use in the test-case code stands out as something unusual, and so as to emphasise that it’s just an arbitrary example representing any class you might happen to use);
  • It should be easy to construct instances of the class (it should have a public constructor that doesn’t require any non-trivial arguments or other set-up or configuration);
  • There shouldn’t be any significant side-effects or performance impact from creating and disposing of instances and using their Object methods such as equals/hashCode/toString (e.g. these shouldn’t do anything like thread creation, accessing of system or network resources etc).

Until now I’ve been picking classes for this fairly arbitrarily. Sometimes I just grab one of the primitive-wrapper classes like java.lang.Float or perhaps java.math.BigInteger if these aren’t otherwise involved in the code – even though they’re rather too widely used to be ideal for this. Otherwise I’ve picked something obscure but harmless from deep within the bowels of the JDK, such as

The problems with this approach are:

  • The intention and reason for using the chosen class aren’t obvious from the code;
  • The test-case ends up with an otherwise-unnecessary and rather misleading “import” and dependency on the chosen class (unless it’s a java.lang class, but the most suitable of those suffer the drawback of being too widely used);
  • Any searches for the chosen class will find these uses of it as well as its “genuine” uses;
  • There’s no easy way to find everywhere that this has been done (for example, if I ever want to change how I handle these situations).

So instead I’ve now started using a purpose-built “ArbitraryObject” class.

The only purpose of this class is to provide a suitably-named class that isn’t related to any other classes, isn’t otherwise relevant to either the test-case code or the code being tested, and isn’t used for any other purpose.

The main benefit is that this makes the intention of the test-case entirely explicit. Wherever ArbitraryObject is used, it’s clear that it represents the use of any class, at a point where a test needs this. In addition, the test-case code no longer has any dependencies on obscure classes that aren’t actually relevant; it’s easy to find all the places where this is being done; and searches for other classes aren’t going to find any “accidental” appearances of a class where it’s been used for this purpose.

ArbitraryObject must be the most trivial class I’ve ever written. Not even worth showing the code! It’s just a subclass of Object with a public no-argument constructor and nothing else.

Potentially one could argue for additional features, such as giving each instance a “name” to be shown in its “toString” representation, making it Serializable, and so forth. But none of that seems worth bothering with.

So this ArbitraryObject class is entirely trivial, and as a class it’s kind of useless, but the name in itself is useful to me.

Sometimes all you need is an explicit name.

%d bloggers like this: