linuxwindowsfilenamesnfsgoogle-cloud-filestore

Preserve filenames on NFS between Windows/Linux


Is there any way of configuring the NFSClient or how the share is mounted on Windows or Linux so that I can preserve filenames across systems?

Currently we have a large number of files that were written on Windows and have now been moved to Google Filestore (NFSv3) so that they can be accessed from other servers. The problem is that many of the files have swedish characters in the name (Å Ä Ö) and when these files are listed in the opposite system to which they were created, the filename becomes unreadable (There is no problem with file contents, just the name)

Currently I am planning on programmatically renaming all the files to remove the offending characters, but would prefer to not have to do this if possible.

Below is an example of how it looks from the Windows and Linux sides. The Linux file being creted on Linux and the Windows one created on windows.

Linux

enter image description here

Windows

enter image description here


Solution

  • This answer may not help you fix the problem, but I thought I'd give some theoretical overview that might help your (and other people's) research.

    You might also want to read this.

    Anyway, Here we go:

    There's a whole lot at play here.

    Filesystems

    On Linux, file names only have 2 rules: They cannot contain a slash (/), and they cannot contain the null byte (\0). ASCII and UTF-8 are compatible with this rule, and those are basically the encodings that linux filesystems support.

    Windows might have different ideas. There might be some configuration that's needed to have the windows filesystem emit characters in a different encoding.

    Creating & Listing files

    On Linux, your file names are almost always encoded in UTF-8. Then, ls and kin generally don't think too much and just assume the above rule that filesystems require.

    Windows' dir obviously knows how to work with NTFS' character encoding, but Can it read Linux' UTF-8 file names? To my best understanding, it supports it with some configuration.

    Terminal

    Modern Linux terminal programs are all UTF-8, but support for other character sets (because Windows) might need to be installed.

    On Windows, it seems to have not been fully supported as of last year. Maybe that's changed, or maybe you'd need another terminal. The above configuration might help.

    NFS

    NFSv4.1 and up have explicit support for UTF-8 and an explicit goal of Unix <-> Windows interoperability.

    NFSv3 does not have any of that, and support for anything non-ASCII is not guaranteed.

    I found one implementation which supports UTF-8 over NFSv3, but Google Filestore's documentation only says "supports any NFSv3-compatible client".

    What to Do

    Go ahead and rename the files. Interoperability has even more issues, e.g. different conceptions of what characters are reserved, there are so many limitations, that your best bet is to make sure all file names are simple plain ASCII, and I would even avoid things like spaces in file names, it makes life a whole lot easier.