When supporting session-tracking by means of URL-rewriting (for which the session ID is added into URLs as a path parameter when necessary), the Javadoc for HttpServletResponse makes it clear that encodeRedirectURL should be used for URLs that are being passed to sendRedirect, but encodeURL should be used for all other URLs. It also says that the reason for having two separate methods is that the rules for deciding whether or not to insert the session ID into the given URL “can differ” between the two methods.
However, the Javadoc does not explain what the relevant rules are, nor why the two methods may need to use different rules, nor whether any specific differences are required between the two methods or whether this is merely a provision through which servlet-container implementations can provide container-specific differences if necessary (or through which differences could be introduced in future).
It has taken a bit of hunting around to find any reasonable explanation of the actual differences between encodeURL and encodeRedirectURL. So even though I’m not convinced I’ve got the entire story or understood all the implications, it seems worth summarising my findings here, so as to provide one more place where people can find details of this.
The only specific answers I could find are from looking at the source code of the Tomcat and Glassfish “reference implementations”, and in a comp.lang.java.help message by Saad Malik, which itself is based on examination of the Tomcat source code.
Looking at the Glassfish source code (as the “latest” reference implementation), the only actual difference between encodeURL and encodeRedirectURL seems to be:
- When inserting the session ID into an empty-string URL, the encodeURL method converts the empty-string into an absolute URL and inserts the session ID into that absolute URL rather than into the given empty string, whereas the encodeRedirectURL treats empty-string URLs just like any other (that is, it attempts to insert the session ID into the given empty string, although the code that actually does the insertion then returns the empty string unchanged because it does not insert the session ID if the given string has nothing prior to any query string or fragment identifier).
- This somewhat contradicts the Javadoc statement that the rules used to insert the session ID can differ between methods (the rules for deciding whether to insert the session ID are exactly the same, but the insertion of the session ID differs).
- There is a comment in the encodeURL source code that justifies its use of the absolute URL if the given URL is an empty string by saying “W3c spec clearly said”. However, this comment doesn’t identify the particular W3C specification involved or on what basis it is believed to require this, so it remains conjecture as to exactly why this is required (see below).
- As this is just code in the reference implementation, it remains unclear to what extent this is the required behaviour of these methods and to what extent it just happens to be what that particular implementation has chosen to do (and it is not even clear whether and why this behaviour is correct).
- The conversion of the given empty-string URL to an absolute URL is based on the URL of the corresponding request (excluding any query string or fragment identifier). It is somewhat unclear whether this should take into account any RequestDispatcher “forwarding” of the request, or whether it should be for the original URL of the request as used by the client when making the request. Given the intended usage of the encodeURL method, I’d assume that the empty-string URL should be taken as relative to the resource that was requested by the client, as per the original request’s getRequestURL (that is, regardless of any subsequent “wrapping” of the request due to any “forwarding” or other such adjustment of the request during its processing). But that’s just my own assumption.
As noted above, the justification for encodeURL converting empty-string URLs into absolute URLs when inserting session IDs into them is said to be the “W3c spec”, but no precise details are given. This is presumably based on RFC 3986, or more likely the older RFC 2396 that it replaces (and which is believed to have a number of ambiguities and contradictions on how empty-string URLs should be handled in various different situations).
From looking at those W3C specifications, I can only assume that the reason this is necessary for encodeURL but not for encodeRedirectURL is related to empty string URLs being “same document” references. Whilst a sendRedirect instructs the client to actually issue the request, for all other uses of URLs the client can potentially treat such “same document” references as not requiring any actual request to the server. This may be a reason for ensuring that a non-empty URL is used when including a session ID in any URL other than for a sendRedirect.
Well, that’s my current theory… but it’s all very unclear (at least to me), and the more I look at these W3C specifications and think it through, the less clear I am as to why encodeUrl needs to do this whilst encodeRedirectURL doesn’t. For example:
- Using the absolute URL rather than an empty string still results in a URL that refers to the same resource, and which might therefore meet the RFC 3986 “equivalence” rules and still be a “same document” reference anyway.
- Although this is done for empty-string URLs, it is not done for any other “same document” references (for example, URLs that start with a “#” and thus consist of a “fragment identifier” on its own).
- This is only done when a session ID is actually being inserted – if no session ID is being returned (for whatever reason), the encodeURL returns the empty-string unchanged, so the use of an empty-string URL is clearly not a problem in itself.
- The code that actually inserts the session ID does not do so if the given URL has nothing prior to any query string or fragment identifier (with a comment “jsessionid can’t be first”), so this is possibly just a way to ensure that the session ID is really inserted. But in that case, it’s unclear why this is necessary for encodeURL but not for encodeRedirectURL.
So although this explains what the actual difference is between encodeURL and encodeRedirectURL, at least in the reference implementation, I still don’t understand why that difference is necessary or what specific W3C rules it addresses. Maybe I’m missing something, or maybe I just haven’t read the W3C specifications thoroughly enough. I’d certainly welcome any more complete explanation anyone can give.
In the meantime, I guess the lesson is that empty-string URLs are best avoided – and it’s fairly hard to imagine any situation where an empty-string must be passed to encodeURL or encodeRedirectURL and no equivalent non-empty value can be used instead.
There is also a separate issue of what the rules should be for whether to insert the session ID or not (in particular, so as to not expose the session ID to other servers or applications, whilst still catering for session IDs that are valid across multiple servlet contexts), but I’ll leave that for another time.