My program is supposed to create two files with user-specified paths. I need to know if the paths lead to the same location, to end with an error before I start changing the filesystem.
Because the paths come from the user, they are expected to be non-canonical and weird.
For example they could be ./dir1/subdir/file
and dir2/subdir/../subdir/file
where dir2
is a symlink to dir1
and subdir
doesn't exist yet. The expected result is still true
, they are equivalent.
The std::filesystem::equivalent
works only on files that already exist.
Is there any similar function without this limitation?
This is a surprisingly difficult problem to solve, and no single standard library function will do it.
There are several cases that you need to worry about:
./
./
std::filesystem::weakly_canonical
will get you part of the way there, but it won't quite get there by itself. For instance, it doesn't address cases when a bare relative path doesn't exist (i.e. foo
won't canonicalize to the same thing as ./foo
) and it doesn't even try to address case-sensitivity.
Here's a canonicalize
function that will take all of that into account. It still has some shortcomings, mainly around non-ASCII characters (i.e. the case-normalization doesn't work for 'É'), but it should work in most cases:
namespace fs = std::filesystem;
std::pair<fs::path, fs::path> splitExistingNonExistingParts(const fs::path& path)
{
fs::path existingPart = path;
while (!existingPart.empty() && !fs::exists(existingPart)) {
existingPart = existingPart.parent_path();
}
return {existingPart, fs::relative(path, existingPart)};
}
fs::path toUpper(const fs::path& path)
{
const fs::path::string_type& native = path.native();
fs::path::string_type lower;
lower.reserve(native.length());
std::transform(
native.begin(),
native.end(),
std::back_inserter(lower),
[](auto c) { return std::toupper(c, std::locale()); }
);
return lower;
}
fs::path toLower(const fs::path& path)
{
const fs::path::string_type& native = path.native();
fs::path::string_type lower;
lower.reserve(native.length());
std::transform(
native.begin(),
native.end(),
std::back_inserter(lower),
[](auto c) { return std::tolower(c, std::locale()); }
);
return lower;
}
bool isCaseSensitive(const fs::path& path)
{
// NOTE: This function assumes the path exists.
// fs::equivalent will throw if that isn't the case
fs::path upper = path.parent_path() / toUpper(*(--path.end()));
fs::path lower = path.parent_path() / toLower(*(--path.end()));
bool exists = fs::exists(upper);
if (exists != fs::exists(lower)) {
// If one exists and the other doesn't, then they
// must reference different files and therefore be
// case-sensitive
return true;
}
// If the two paths don't reference the same file, then
// the filesystem must be case-sensitive
return !fs::equivalent(upper, lower);
}
fs::path normalizeCase(const fs::path& path)
{
// Normalize the case of a path to lower-case if it is on a
// non-case-sensitive filesystem
fs::path ret;
for (const fs::path& component : path) {
if (!isCaseSensitive(ret / component)) {
ret /= toLower(component);
} else {
ret /= component;
}
}
return ret;
}
fs::path canonicalize(fs::path path)
{
if (path.empty()) {
return path;
}
// Initial pass to deal with .., ., and symlinks in the existing part
path = fs::weakly_canonical(path);
// Figure out if this is absolute or relative by assuming that there
// is a base path component that will always exist (i.e. / on POSIX or
// the drive letter on Windows)
auto [existing, nonExisting] = splitExistingNonExistingParts(path);
if (!existing.empty()) {
existing = fs::canonical(fs::absolute(existing));
} else {
existing = fs::current_path();
}
// Normalize the case of the existing part of the path
existing = normalizeCase(existing);
// Need to deal with case-sensitivity of the part of the path
// that doesn't exist. Assume that part will have the same
// case-sensitivity as the last component of the existing path
if (!isCaseSensitive(existing)) {
path = existing / toLower(nonExisting);
} else {
path = existing / nonExisting;
}
// Call weakly_canonical again to deal with any existing symlinks that were
// hidden by .. components after non-existing path components
fs::path temp;
while ((temp = fs::weakly_canonical(path)) != path) {
path = temp;
}
return path;
}