Difference between HttpServletResponse.encodeURL and encodeRedirectURL

11 02 2007

When supporting session-tracking by means of URL-rewriting (for which the session ID is added into URLs as a path parameter when necessary), the Javadoc for HttpServletResponse makes it clear that encodeRedirectURL should be used for URLs that are being passed to sendRedirect, but encodeURL should be used for all other URLs. It also says that the reason for having two separate methods is that the rules for deciding whether or not to insert the session ID into the given URL “can differ” between the two methods.

However, the Javadoc does not explain what the relevant rules are, nor why the two methods may need to use different rules, nor whether any specific differences are required between the two methods or whether this is merely a provision through which servlet-container implementations can provide container-specific differences if necessary (or through which differences could be introduced in future).

It has taken a bit of hunting around to find any reasonable explanation of the actual differences between encodeURL and encodeRedirectURL. So even though I’m not convinced I’ve got the entire story or understood all the implications, it seems worth summarising my findings here, so as to provide one more place where people can find details of this.

The only specific answers I could find are from looking at the source code of the Tomcat and Glassfish “reference implementations”, and in a comp.lang.java.help message by Saad Malik, which itself is based on examination of the Tomcat source code.

Looking at the Glassfish source code (as the “latest” reference implementation), the only actual difference between encodeURL and encodeRedirectURL seems to be:

  • When inserting the session ID into an empty-string URL, the encodeURL method converts the empty-string into an absolute URL and inserts the session ID into that absolute URL rather than into the given empty string, whereas the encodeRedirectURL treats empty-string URLs just like any other (that is, it attempts to insert the session ID into the given empty string, although the code that actually does the insertion then returns the empty string unchanged because it does not insert the session ID if the given string has nothing prior to any query string or fragment identifier).

Note that:

  • This somewhat contradicts the Javadoc statement that the rules used to insert the session ID can differ between methods (the rules for deciding whether to insert the session ID are exactly the same, but the insertion of the session ID differs).
  • There is a comment in the encodeURL source code that justifies its use of the absolute URL if the given URL is an empty string by saying “W3c spec clearly said”. However, this comment doesn’t identify the particular W3C specification involved or on what basis it is believed to require this, so it remains conjecture as to exactly why this is required (see below).
  • As this is just code in the reference implementation, it remains unclear to what extent this is the required behaviour of these methods and to what extent it just happens to be what that particular implementation has chosen to do (and it is not even clear whether and why this behaviour is correct).
  • The conversion of the given empty-string URL to an absolute URL is based on the URL of the corresponding request (excluding any query string or fragment identifier). It is somewhat unclear whether this should take into account any RequestDispatcher “forwarding” of the request, or whether it should be for the original URL of the request as used by the client when making the request. Given the intended usage of the encodeURL method, I’d assume that the empty-string URL should be taken as relative to the resource that was requested by the client, as per the original request’s getRequestURL (that is, regardless of any subsequent “wrapping” of the request due to any “forwarding” or other such adjustment of the request during its processing). But that’s just my own assumption.

    As noted above, the justification for encodeURL converting empty-string URLs into absolute URLs when inserting session IDs into them is said to be the “W3c spec”, but no precise details are given. This is presumably based on RFC 3986, or more likely the older RFC 2396 that it replaces (and which is believed to have a number of ambiguities and contradictions on how empty-string URLs should be handled in various different situations).

    From looking at those W3C specifications, I can only assume that the reason this is necessary for encodeURL but not for encodeRedirectURL is related to empty string URLs being “same document” references. Whilst a sendRedirect instructs the client to actually issue the request, for all other uses of URLs the client can potentially treat such “same document” references as not requiring any actual request to the server. This may be a reason for ensuring that a non-empty URL is used when including a session ID in any URL other than for a sendRedirect.

    Well, that’s my current theory… but it’s all very unclear (at least to me), and the more I look at these W3C specifications and think it through, the less clear I am as to why encodeUrl needs to do this whilst encodeRedirectURL doesn’t. For example:

    • Using the absolute URL rather than an empty string still results in a URL that refers to the same resource, and which might therefore meet the RFC 3986 “equivalence” rules and still be a “same document” reference anyway.
    • Although this is done for empty-string URLs, it is not done for any other “same document” references (for example, URLs that start with a “#” and thus consist of a “fragment identifier” on its own).
    • This is only done when a session ID is actually being inserted – if no session ID is being returned (for whatever reason), the encodeURL returns the empty-string unchanged, so the use of an empty-string URL is clearly not a problem in itself.
    • The code that actually inserts the session ID does not do so if the given URL has nothing prior to any query string or fragment identifier (with a comment “jsessionid can’t be first”), so this is possibly just a way to ensure that the session ID is really inserted. But in that case, it’s unclear why this is necessary for encodeURL but not for encodeRedirectURL.

      So although this explains what the actual difference is between encodeURL and encodeRedirectURL, at least in the reference implementation, I still don’t understand why that difference is necessary or what specific W3C rules it addresses. Maybe I’m missing something, or maybe I just haven’t read the W3C specifications thoroughly enough. I’d certainly welcome any more complete explanation anyone can give.

      In the meantime, I guess the lesson is that empty-string URLs are best avoided – and it’s fairly hard to imagine any situation where an empty-string must be passed to encodeURL or encodeRedirectURL and no equivalent non-empty value can be used instead.

      There is also a separate issue of what the rules should be for whether to insert the session ID or not (in particular, so as to not expose the session ID to other servers or applications, whilst still catering for session IDs that are valid across multiple servlet contexts), but I’ll leave that for another time.

      Advertisements

      Actions

      Information

      3 responses

      10 10 2007
      Frank

      As far as I understand, the two methods EncodeUrl and EncodeRedirectUrl enable you (as an software implementer) to customize URLs as well.

      An example can be a ServletFilter. You might want to write a Filter that ensures no SessionId is sent to anywhere else than the own Servlet. Overwriting the EncodeRedirectUrl allows you to exclude the sessionId for all (or only a subset) of the redirected Urls.

      This would explain the reason for having two separate methods is that the rules for deciding whether or not to insert the session ID into the given URL “can differ” between the two methods. The standardimplementation does nothing very exciting (see your findings above), but allows to implement one owns rules.

      Best regards

      Frank

      11 10 2007
      closingbraces

      Frank,

      That’s an interesting idea – use a filter to wrap the response before processing it, with the wrapper overriding these methods to apply your own decision or customization for all URLs passed to these two API methods, which in theory should be all URLs returned by the application.

      I can imagine scenarios where someone would find such a filter useful, especially if they’re willing to ignore the strict definition of these methods and just use them for arbitrary customization of the URLs (presumably in addition to calling the inherited method to get normal URL-rewriting for session-tracking). For example, to transparently route all URLs through some statistics-gathering site.

      But I’m still struggling to come up with situations where you’d need different rules for whether to insert session-id when redirecting to a URL as opposed to giving a non-redirect link to the same URL. For either case the URL could be for the originating servlet, or another in the same application, or another application on same container, or an entirely external URL.

      I can see that the decision might depend on the nature of the URL, but not on whether it is being returned to the browser as a URL for the browser to follow automatically as opposed to one that the end-user can optionally follow by clicking on the link. Especially when even that distinction depends on assumptions about the nature of the receiving software and how it is handling redirects and other URLs (e.g. maybe redirects are being blocked and displayed for manual approval; maybe it’s a spider that’s following all links automatically).

      Are you actually using this technique? Or does anyone else have any examples? It’s not especially relevant to the meaning of the Servlet API Javadoc or servlet-container implementations of these methods, but I’d be curious to know if anybody is using this technique and for what purpose, and with what differences between the redirect and non-redirect logic.

      Mike

      21 10 2010

      Leave a Reply

      Fill in your details below or click an icon to log in:

      WordPress.com Logo

      You are commenting using your WordPress.com account. Log Out / Change )

      Twitter picture

      You are commenting using your Twitter account. Log Out / Change )

      Facebook photo

      You are commenting using your Facebook account. Log Out / Change )

      Google+ photo

      You are commenting using your Google+ account. Log Out / Change )

      Connecting to %s




      %d bloggers like this: