Context: I am creating an app that stores its data in the location.hash. I want to encode as few characters as possible to maintain maximum legibility.
As explained in this answer, reserved characters are different for each segment of the URL. So what are the limitations for URL Fragment/location.hash specifically?
Related post: Unicode characters in URLs
According to RFC 3986: Uniform Resource Identifier (URI):
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
Unpacking all that, and ignoring percent-encoding, I find the following set of characters:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._~!$&'()*+,;=:@/?
Although the RFC does not mandate a particular encoding and deals in characters only (not bytes), according to Section 2.3 ALPHA
means ASCII only, i.e. the 26 letters of the Latin alphabet. Any non-ASCII letters must therefore be percent-encoded.