c++c++17std-filesystem

Version of std::filesystem::equivalent for non-existing files


My program is supposed to create two files with user-specified paths. I need to know if the paths lead to the same location, to end with an error before I start changing the filesystem.

Because the paths come from the user, they are expected to be non-canonical and weird. For example they could be ./dir1/subdir/file and dir2/subdir/../subdir/file where dir2 is a symlink to dir1 and subdir doesn't exist yet. The expected result is still true, they are equivalent.

The std::filesystem::equivalent works only on files that already exist. Is there any similar function without this limitation?


Solution

  • This is a surprisingly difficult problem to solve, and no single standard library function will do it.

    There are several cases that you need to worry about:

    std::filesystem::weakly_canonical will get you part of the way there, but it won't quite get there by itself. For instance, it doesn't address cases when a bare relative path doesn't exist (i.e. foo won't canonicalize to the same thing as ./foo) and it doesn't even try to address case-sensitivity.

    Here's a canonicalize function that will take all of that into account. It still has some shortcomings, mainly around non-ASCII characters (i.e. the case-normalization doesn't work for 'É'), but it should work in most cases:

    namespace fs = std::filesystem;
    
    std::pair<fs::path, fs::path> splitExistingNonExistingParts(const fs::path& path)
    {
        fs::path existingPart = path;
        while (!existingPart.empty() && !fs::exists(existingPart)) {
            existingPart = existingPart.parent_path();
        }
        return {existingPart, fs::relative(path, existingPart)};
    }
    
    fs::path toUpper(const fs::path& path)
    {
        const fs::path::string_type& native = path.native();
        fs::path::string_type lower;
        lower.reserve(native.length());
        std::transform(
            native.begin(),
            native.end(),
            std::back_inserter(lower),
            [](auto c) { return std::toupper(c, std::locale()); }
        );
        return lower;
    }
    
    fs::path toLower(const fs::path& path)
    {
        const fs::path::string_type& native = path.native();
        fs::path::string_type lower;
        lower.reserve(native.length());
        std::transform(
            native.begin(),
            native.end(),
            std::back_inserter(lower),
            [](auto c) { return std::tolower(c, std::locale()); }
        );
        return lower;
    }
    
    bool isCaseSensitive(const fs::path& path)
    {
        // NOTE: This function assumes the path exists.
        //       fs::equivalent will throw if that isn't the case
    
        fs::path upper = path.parent_path() / toUpper(*(--path.end()));
        fs::path lower = path.parent_path() / toLower(*(--path.end()));
    
        bool exists = fs::exists(upper);
        if (exists != fs::exists(lower)) {
            // If one exists and the other doesn't, then they
            // must reference different files and therefore be
            // case-sensitive
            return true;
        }
    
        // If the two paths don't reference the same file, then
        // the filesystem must be case-sensitive
        return !fs::equivalent(upper, lower);
    }
    
    fs::path normalizeCase(const fs::path& path)
    {
        // Normalize the case of a path to lower-case if it is on a
        // non-case-sensitive filesystem
    
        fs::path ret;
        for (const fs::path& component : path) {
            if (!isCaseSensitive(ret / component)) {
                ret /= toLower(component);
            } else {
                ret /= component;
            }
        }
        return ret;
    }
    
    fs::path canonicalize(fs::path path)
    {
        if (path.empty()) {
            return path;
        }
    
        // Initial pass to deal with .., ., and symlinks in the existing part
        path = fs::weakly_canonical(path);
    
        // Figure out if this is absolute or relative by assuming that there
        // is a base path component that will always exist (i.e. / on POSIX or
        // the drive letter on Windows)
        auto [existing, nonExisting] = splitExistingNonExistingParts(path);
        if (!existing.empty()) {
            existing = fs::canonical(fs::absolute(existing));
        } else {
            existing = fs::current_path();
        }
    
        // Normalize the case of the existing part of the path
        existing = normalizeCase(existing);
    
        // Need to deal with case-sensitivity of the part of the path
        // that doesn't exist.  Assume that part will have the same
        // case-sensitivity as the last component of the existing path
        if (!isCaseSensitive(existing)) {
            path = existing / toLower(nonExisting);
        } else {
            path = existing / nonExisting;
        }
    
        // Call weakly_canonical again to deal with any existing symlinks that were
        // hidden by .. components after non-existing path components
        fs::path temp;
        while ((temp = fs::weakly_canonical(path)) != path) {
            path = temp;
        }
        return path;
    }
    

    Live Demo