As discussed in the previous article Some issues in the HttpServlet source code, I’ve recently noticed a number of problems in the source code of javax.servlet.http.HttpServlet, primarily in how its “doHead” method caters for HTTP “HEAD” requests. This week I’ve finally got round to reporting them to the relevant mailing list, and have been following up the response.
It took me a while to decide where to report these problems. There doesn’t seem to be any public forum for the Servlet API expert group, and although the code is present in Tomcat and Glassfish (and presumably elsewhere too), it’s unlikely that those projects are responsible for maintaining the code. The two related bugs that are present in the Tomcat bug list had long ago been referred to the Servlet Specification “expert group” as being their responsibility rather than Tomcat’s. So I tried e-mailing the Servlet API “feedback” address but got no reply (though as it happens, my current ISP’s e-mail system is currently very unreliable, so it’s possible it just didn’t get sent). There’s a “Servlet API” category in Sun’s bug database, but there’s nothing relevant on it since 2002, and the form for creating entries asks for selection from a list of version numbers that are clearly for something entirely different. I started trying to raise an “issue” on the Glassfish project, but that required getting “observer” rights on the project, so whilst waiting for that I posted to the Tomcat developer’s mailing list, hoping to at least get an answer to where I should really be raising this.
I was very pleasantly surprised to get an immediate and very positive response from a Jan Luehe of Sun. He raised this as a bug report for the Servlet expert group, at “https://servlet-spec-eg.dev.java.net/issues/show_bug.cgi?id=40” (which isn’t publically accessible) and as an issue on Glassfish at https://glassfish.dev.java.net/issues/show_bug.cgi?id=2212. Within a day or so he came back with patches, including a fix for another problem that I hadn’t spotted.
He also pointed out that the Servlet Specification “expert group” is now a project on java.net (project “servlet-spec-eg”). It’s not public, so you have to request a suitable role if you want to be involved (e.g. “Observer” role to view things and raise issues). It’d probably be quite useful for me to join, but I don’t feel I can at present because the licence conditions on the Servlet Specification itself might preclude me from using it my current work, so I’m being careful to not use it – and I don’t want to risk being “tainted” by having access to non-public discussions. There also appears to be a parallel public project, which I hadn’t spotted before, but it doesn’t seem to have any substantive content yet.
Jan Luehe’s fixes look OK as far as they go, and fix many of the specific bugs that I’d reported. But whilst looking at it, and trying to think though which issues were and weren’t fixed, I realised there are both more specific bugs and far more fundamental problems with the entire approach of having HttpSerlvet “wrap” the response in a “NoBodyResponse”. (At this point, you probably need to refer to the previous article and get a copy of the HttpServlet source code to look at (from Tomcat, Glassfish, or somewhere online such as at DocJar) if you want to follow the detail of the rest of this discussion – or just skip to the end for conclusions).
Firstly, there are many other features of the ServletResponse that the current approach just doesn’t cater for. For example, it doesn’t make the response’s isCommitted return true when the amount written reaches the buffer size or the explicitly-specified length, calls to reset and resetBuffer don’t affect the resulting content length, and calls to RequestDispatcher.forward during the servlet don’t affect the resulting content length. It’s hard to see how the latter could be implemented.
Secondly, and more generally, the NoBodyResponse wrapper doesn’t correctly handle responses that are themselves NoBodyResponse instances. That is, if an HttpServlet’s processing passes the response to another HttpServlet (via a request dispatcher “include”), it doesn’t work. You end up with one NoBodyResponse wrapping another, but they are each separately and independently, maintain separate instances of their fake OutputStream and other internal fields, and both initialize themselves to reflect the “start” of response processing. When you follow this through, the “inner” servlet initializes its NoBodyResponse wrapper from scratch (ignoring anything that has already happened to the response), then at the end it explicitly sets the response’s contentLength to reflect the amount of data written during the “inner” servlet (and passes this on to the wrapped NoBodyResponse and the underlying HttpServletResponse). Then on return from the “inner” servlet, the “outer” servlet sees its NoBodyResponse has already had its contentLength explicitly set. So any further processing within the “outer” servlet sees the contentLength as alread set (despite no explicit setting of it by the “inner” servlet’s code), and at the end of the “outer” servlet the fact that the contentLength has already been set results in it being left unchanged, regardless of what content the “outer” servlet has written. You end up with a response whose content-length has been explicitly set to the amount of data written during the “inner” servlet on its own, ignoring anything written by the “outer” servlet.
Similarly, the proposed fix for making getOutputStream cause getWriter to fail and vice-versa won’t fully work if it’s based on each instance maintaining its own record of which of these methods have been called. For example, when one NoBodyResponse is wrapping another, calling getOutputStream on the wrapped NoBodyResponse must prevent subsequent calls to getWriter on the wrapping NoBodyResponse. The same goes for all of the various effects that calling getWriter is supposed to have on any subsequent calls to the response’s various methods (such as getCharacterEncoding). In fact, the correct implementation for getOutputStream and getWriter in NoBodyResponse is probably to call the wrapped response’s corresponding method and just ignore the returned object – that way all of the normal effects of getOutputStream and getWriter are obtained, and it works even when there are multiple layers of wrapper… and all without having to override getCharacterEncoding etc (because ultimately, the “real” getOutputStream/getWriter has really been called).
The broader problem is that HttpServlet code is effectively assuming that each HttpServlet instance is carrying out the entire processing of the request: it assumes there is nothing beforehand for it to take into account, that the data that is written during its processing is the whole of the response, and that on completion it can indicate the content length by explicitly setting the response’s contentLength even though doing so could theoretically have an effect on any further processing of the response. However, this isn’t true. When filters and request dispatcher forwards/includes are involved, the HttpServlet isn’t necessarily doing all of the processing. There may be filters processing the request and response before and/or after the HttpServlet, and the HttpServlet might be being invoked as a result of a request-dispatcher “include” within another Servlet (which itself might not be an HttpServlet).
So even if the HTTP “HEAD” processing within HttpServlet were entirely fixed, this doesn’t really give us correct automatic handling of “HEAD” requests. Any filters, non-HttpServlet servlets and any other such components involved in processing the request also need to cater for “HEAD”. What is more, these all have to fit together – each one has to know how to integrate the content length of anything called within it into its overall content length. For example, when invoking an HttpServlet the invoking code needs to know that it will indicate the length of its own content by setting the response’s contentLength, and that this needs to be added to the length of anything written before or after the HttpServlet. And any such code faces all the same issues as HttpServlet does – how to suppress the output whilst counting its length, in a way that still tracks whether getOutputStream and/or getWriter have been called and adjusts the response’s behaviour accordingly, how to handle explicit setContentLength calls or the absence of them, how to handle reset/resetBuffer calls, automatic “commit” on buffer overflow etc, reset of output on “forward” etc.
All in all this seems insanely complicated, error prone and impractical. As evidenced by the fact that even the HttpServlet code still has it wrong, after all this time, and after an attempt to fix it. Most people simply won’t be aware of these issues, won’t bother with it even if they are (I can’t imagine anyone pausing whilst writing a filter and thinking, hey, now I need to build in
support for HEAD requests), and will get it wrong even if they bother with it.
I think there are two fundamental issues here:
- It’s very, very hard to filter/wrap/adjust the behaviour of a ServletResponse’s OutputStream. There are just too many “under the covers” interactions between the output stream and the rest of the response. HttpServlet tries to adjust the output stream’s processing by wrapping the response so as to use a totally separate output stream, but this doesn’t work properly because it can’t intercept or reproduce all of these interactions. This is probably also true of many people’s attempts to write filters to adust the output (for compression etc). Maybe what we really need is something built into ServletResponse that lets you install instances of an “OutputStreamWrapper” for the response’s actual ServletOutputStream to use as a wrapper around its underlying output stream. This could then let the ServletOutputStream operate as normal, whilst modifying the bytes actually written to the real output. That’d be much simpler than trying to find a way to use an entirely unrelated output stream and then having to cater for all the interactions that only the real, implementation-specific ServletOutputStream can implement.
- Suppressing the output of body content during a “HEAD” request is broader in scope than any one type of component (such as HttpServlet) can deal with. It’s a behaviour of the entire processing of the request. We need a simple way to guarantee that by default the entire processing of the request is exactly the same as for a “GET”, regardless of what components are involved and regardless of how complex the application code, whilst also guaranteeing that no actual body content is returned to the client. I think the simplest way to do this would be to simply mandate (e.g. in the next version of the Servlet Specification) that the servlet container is responsible for suppressing the sending of the body content for responses to HTTP “HEAD” requests (for example, must stop sending once the end of the headers have been sent). Presumably, all servlet containers must already be doing this anyway for “HEAD” requests for static resources. They should just do the same for “HEAD” requests for dynamic resources (if they don’t do so already). As far as I can see this would be simple, safe, and effective, and would entirely eliminate NoBodyResponse and its associated processing. The default HttpServlet “doHead” would then just be a simple call to “doGet”. Servlets could still override “doHead” if they want to provide more efficent “HEAD” processing, but would have the additional safeguard that even if they get it wrong there wouldn’t be any body content sent out. For compatibility, you’d maybe want to retain the existing code and behaviour for older Servlet API versions, with a documented warning that it may give incorrect or inaccurate results for some circumstances.
So I’ve passed all of this on to Jan Luehe, and am waiting to see his views on it. I guess that means there might be another frighteningly long article on this in a while…