Beware of using java.util.Scanner with “/z”

17 12 2011

There are various articles and blog postings around that suggest that using Scanner with a “/z” delimiter is an easy way to read an entire file in one go (with “/z” being the regular expression for “end of input”).

Some examples are:

Because a single read with “/z” as the delimiter should read everything until “end of input”, it’s tempting to just do a single read and leave it at that, as the examples listed above all do.

In most cases that’s OK, but I’ve found at least one situation where reading to “end of input” doesn’t read the entire input – when the input is a SequenceInputStream, each of the constituent InputStreams appears to give a separate “end of input” of its own. As a result, if you do a single read with a delimiter of “/z” it returns the content of the first of the SequenceInputStream’s constituent streams, but doesn’t read into the rest of the constituent streams.

At any rate, that what I get on Oracle JDK 5, 6 and 7.

This might be a quirk or bug in Scanner, SequenceInputStream, regular expression processing, or how “end of input” is detected, or it might be some subtlety in the meaning of “/z” that I’m not privy to. Equally, there might be other types of InputStream with constituent sub-components that each report a separate “end of input”. But whatever the underlying reasons and scope of this problem, it seems safest to never assume that a single read delimited by “/z” will always read the whole of an input stream.

So if you really want to use Scanner to read the whole of something, I’d recommend that even when using “/z” you should still iterate the read until the Scanner reports “hasNext” as false (even though that rather reduces the attraction of using Scanner for this, as opposed to some other more direct approach to reading through the whole of the input).








%d bloggers like this: