urlspecificationsrfc2396

Why are characters like @, $, :, and ; reserved characters in a url query component?


I'm reading RFC2396 on URLs which says

Many URI include components consisting of or delimited by, certain special characters. These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose.

But the section on the query part of url (between ? and #) says

3.4. Query Component The query component is a string of information to be interpreted by the resource.

query         = *uric

Within a query component, the characters ";", "/", "?", ":", "@", "&", "=", "+", ",", and "$" are reserved.

What is the "reserved purpose of each of those characters? I understand what &, =, and + are used for in the query, but what about the other characters?

More practically, should I always url encode those characters when they're in the query? Browsers and servers that I've seen handle : and ; and other characters without being encoded


Solution

  • I think that Section 2.2 of RFC 3986, which obsoletes RFC 2396, has a possible explanation. I quote:

    These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm.

    I think that what Berners-Lee, et al. are trying to get at here is that even if not all reserved characters are used in the generic syntax described in the RFC, the authors wanted to leave enough latitude for future schemes or implementation specific code to be able to use those characters as they saw fit.

    As to whether you should encode those characters, my opinion is that you should research and use a Percent-Encoding Algorithm that follows the standard and not use a non-standard one or try to roll-your-own. For instance, if you are using a language like C# or Python then the libraries that come with those languages include a standards-compliant implementation of the algorithm. For more details, the section 2.4 of RFC 3986 covers when to encode or decode.