In macOS 10.12, NSURLCanonicalPathKey
was added to NSURL
. The documentation states:
The URL's path as a canonical absolute file system path.
Outside of that, the only other documentation/information I've seen of it is from a Swift Forum post that states:
You might want to take a look at .canonicalPathKey (NSURLCanonicalPathKey). On Apple platforms a lot of the standard UNIXy paths exist within /private/, with corresponding symlinks from the root. So /etc/ is actually /private/etc/. If you don’t canonicalise the paths you can get tripped up by this.
This seems like a pretty big deal to me yet I'm surprised it was only introduced in 10.12. I've only ever relied on NSURLPathKey
, .path
or bookmark data for resolving URLs and never had a problem.
Should I now be using the canonical path everywhere I previously used the standard path value?
If I'm storing path information in a database as a string, should I
store the value of .path
or NSURLCanonicalPathKey
?
If I'm converting an NSURL
to a string representation for use in a C/C++ library that requires a file path, should I use canonical path representation?
If you're displaying the path of a file to the user, should you show the canonical path?
How does NSURLCanonicalPathKey
compare to URLByStandardizingPath
and URLByResolvingSymlinksInPath
, which seem to sort of do the same thing or the opposite thing...(?)
This is on macOS 10.14 and I'm only considering URLs that point to files or folders. I'm aware that bookmark data should probably be stored in a database rather than paths.
It depends on how you plan to use the path:
[NSURL fileURLWithPath:]
, then you can keep using the regular path as you received it, because usually you get the paths because the user gave it to you in some way, and then it's best if you do not alter it.[NSURL isEqual:]
will give you false
- if you do not like that, you'll have to canonicalize them.Unicode normalization may also be of significance. E.g, if a file or folder uses precomposed (NFC) characters, the NSURL methods will turn them into NFD strings. OTOH, the BSD/POSIX functions won't do that. So, if you, for example, get the paths from a shell command and then compare them to paths you have from NSURLs, they may not calculate as equal due to one using NFC and the other NFD chars. Ideally, if NSURL or NSFileManager gets involved with the paths, then you should also first pass your BSD paths through NSURL so that you end up having both types of paths in the same composition format.
Input | URLByStandardizingPath | NSURLCanonicalPathKey |
---|---|---|
/private/var | /var | /private/var |
/var | /var | /private/var |
The following example uses a prepared APFS volume that contains file names with both a precomposed and a decomposed representation of the letter "ü", along with symlinks. You can download the disk image file here.
The directory layout is as follows:
$ cd /Volumes/Canonical_Normalize_Test/
$ ls -lR
total 24
-rw-r--r-- 1 user staff 19 Dec 29 19:27 decomposed_ü
-rw-r--r-- 1 user staff 19 Dec 29 19:27 precomposed_ü
drwxr-xr-x 4 user staff 128 Dec 29 19:36 symlink_target_dir
lrwxr-xr-x 1 user staff 18 Dec 29 19:36 symlink_to_dir -> symlink_target_dir
-rwxr-xr-x@ 1 user staff 763 Dec 15 16:28 unicode_composition_check.sh
./symlink_target_dir:
total 0
lrwxr-xr-x 1 user staff 17 Dec 29 19:36 decomposed_ü -> ../decomposed_ü
lrwxr-xr-x 1 user staff 17 Dec 29 19:36 precomposed_ü -> ../precomposed_ü
The file "unicode_composition_check.sh" is a script that creates the two "...ü" files, one name using NFD, the other NFC (the script is inadequately named, unfortunately).
Input is:
/Volumes/Canonical_Normalize_Test/symlink_to_dir/precomposed_\U00fc
(I.e. the path includes a directory symlink and uses the actual file's unicode composition, i.e. the target file name's "ü" is precomposed.)
Method | Result |
---|---|
fileSystemRepresentation | /Volumes/Canonical_Normalize_Test/symlink_to_dir/precomposed_u\U0308 |
URLByStandardizingPath | /Volumes/Canonical_Normalize_Test/symlink_to_dir/precomposed_u\U0308 |
NSURLCanonicalPathKey | /Volumes/Canonical_Normalize_Test/symlink_target_dir/precomposed_u\U0308 |
URLByResolvingSymlinksInPath | /Volumes/Canonical_Normalize_Test/precomposed_u\U0308 |
We see that each method gives a different result:
They all appear to normalize the path into NFD, i.e. the "ü" gets decomposed in all cases. That's necessary and normal for regular case-insensitive volumes, as the lookup for file names is normalization-insensitive. However: For case-sensitive volumes, the composition must not be changed, and while I've not tested this, I assume that all the above functions will detect the volume's case sensitivity mode and behave accordingly.
Only NSURLCanonicalPathKey
gives the correct result that is needed if we want to re-identify the target item later by path (indifferent to which Unicode composition is used and whether the path includes symlinks to a directory): It resolves the directory symlink but not the final symlink that's inside the symlink_target_dir
. If it did resolve the final path element (like URLByResolvingSymlinksInPath
does), you would not be able to target symlink files.
NSString's fileSystemRepresentation
does not alter the path (but normalizes it) whereas NSURL's URLByStandardizingPath
alters the path in some cases (e.g. by removing "/private" from certain root folders).
Only NSURLCanonicalPathKey
will fix upper/lower case based on the actual on-disk path. For example, a URL created from "/applications" will not be turned into the actual "/Applications" path by any of the other functions.
If you need to re-identify the path later, no matter which representation (normalization, symlinks to dirs) is used, use either NSURLCanonicalPathKey
if you need to retain the actual item, even if it's a symlink, or use URLByResolvingSymlinksInPath
to always identify the target of any symlinks given to you.
Note, however (see first example) that if you use URLByResolvingSymlinksInPath
, "/private/var/tmp" etc. will be turned into "/var/tmp" etc., which is unusal because it then still contains a symlink (i.e. "/var").
Also keep in mind that the case may not be correct unless you get the canonical path. And to compensate for that, comparing paths requires you to first check whether the path is on a case-insensitive volume or not so that you use the correct comparison options (and, as an added complication, simply comparing paths with the "case insensitive" option may not be correct for some rare scripts on HFS+ volumes, because they use an older Unicode standard that had some other rules than the current macOS versions use).
Lastly, if you just want to see if two paths point to the same file, it's safer to use other means that do not rely on paths. See this answer. And if you need to persistently remember file locations, it's best to use bookmarks, so that they are even found if the user has renamed or moved the file in the meantime.
Disclaimer: All these findings were found empirically, as tested on both macOS 10.13.6 and 11.1 (and the systems in between), so you may want to double check my findings and leave a comment if you get different results.