objective-cswiftnsurlpercent-encoding

Inconsistencies in URL encoding methods across Objective-C and Swift


I have the following Objective-C code:

[@"http://www.google.com" stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
// http%3A//www.google.com

And yet, in Swift:

"http://www.google.com".addingPercentEncoding(withAllowedCharacters: .urlPathAllowed)
// http://www.google.com

To what can I attribute this discrepancy?

..and for extra credit, can I rely on this code to encode for url path reserved characters while passing a full url like this?


Solution

  • The issue actually rests in the difference between NSString method stringByAddingPercentEncodingWithAllowedCharacters and String method addingPercentEncoding(withAllowedCharacters:). And this behavior has been changing from version to version. (It looks like the latest beta of iOS 11 now restores this behavior we used to see.)

    I believe the root of the issue rests in the particulars of how paths are percent encoded. Section 3.3 of RFC 3986 says that colons are permitted in paths except in the first segment of a relative path.

    The NSString method captures this notion, e.g. imagine a path whose first directory was foo: (with a colon) and a subdirectory of bar: (also with a colon):

    NSString *string = @"foo:/bar:";
    NSCharacterSet *cs = [NSCharacterSet URLPathAllowedCharacterSet];
    NSLog(@"%@", [string stringByAddingPercentEncodingWithAllowedCharacters:cs]);
    

    That results in:

    foo%3A/bar:

    The : in the first segment of the page is percent encoded, but the : in subsequent segments are not. This captures the logic of how to handle colons in relative paths per RFC 3986.

    The String method addingPercentEncoding(withAllowedCharacters:), however, does not do this:

    let string = "foo:/bar:"
    os_log("%@", string.addingPercentEncoding(withAllowedCharacters: .urlPathAllowed)!)
    

    Yields:

    foo:/bar:

    Clearly, the String method does not attempt that position-sensitive logic. This implementation is more in keeping with the name of the method (it considers solely what characters are "allowed" with no special logic that tries to guess, based upon where the allowed character appears, whether it's truly allowed or not.)


    I gather that you are saddled with the code supplied in the question, but we should note that this behavior of percent escaping colons in relative paths, while interesting to explain what you experienced, is not really relevant to your immediate problem. The code you have been provided is simply incorrect. It is attempting to percent encode a URL as if it was just a path. But, it’s not a path; it’s a URL, which is a different thing with its own rules.

    The deeper insight in percent encoding URLs is to acknowledge that different components of a URL allow different sets of characters, i.e. they require different percent encoding. That’s why NSCharacterSet has so many different URL-related character sets.

    You really should percent encode the individual components, percent encoding each with the character set allowed for that type of component. Only when the individual components are percent encoded should they then be concatenated together to form the whole the URL.

    Alternatively, NSURLComponents is designed precisely for this purpose, getting you out of the weeds of percent-encoding the individual components yourself. For example:

    var components = URLComponents(string: "http://httpbin.org/post")!
    let foo = URLQueryItem(name: "foo", value: "bar & baz")
    let qux = URLQueryItem(name: "qux", value: "42")
    components.queryItems = [foo, qux]
    
    let url = components.url!
    

    That yields the following, with the & and the two spaces properly percent escaped within the foo value, but it correctly left the & in-between foo and qux:

    http://httpbin.org/post?foo=bar%20%26%20baz&qux=42

    It’s worth noting, though, that NSURLComponents has a small, yet fairly fundamental flaw: Specifically, if you have query values, NSURLQueryItem, that could have + characters, most web services need that percent escaped, but NSURLComponents won’t. If your URL has query components and if those query values might include + characters, I’d advise against NSURLComponents and would instead advise percent encoding the individual components of a URL yourself.