databasedatabase-designfile-ioconsistencygfs

Does the Google File System allow listing directory contents?


In the GFS paper, Section 4.1 describes how GFS is able to make concurrent mutations within a directory while only requiring a read lock on the directory for each client - there's no actual inode in GFS, so clients are free to create, remove, or mutate /x/y/somefile while only requiring a read lock on /x/ and /x/y/.

If there are no inodes, then is it still possible to maintain an explicit tree structure? The only way I can see this working is if the master maintains a flattened, 1-dimensional mapping from directory or file names to their metadata, allowing for fast file creation and manipulation.

Suppose that some client of GFS wanted to scan the names of all files in a directory - for instance, ls. Without an iteration over all metadata nodes, how is this possible?

It might be possible for a client to maintain their own version of what they think the directory tree looks like in the GFS, but this will only work if each client keeps to their own directory.


Solution

  • A master lookup table offers access to a single conceptual tree of nodes. It does this by listing all paths of names to nodes. Some nodes are directories. The only data is owned by non-directory leaf nodes. Eg these paths:

    /a/b/foo
    /a/b/c/bar
    /a/baz/
    

    describe this tree:

    \
     a/--b/--foo
     |  \
     |   c/--bar
     baz/
    

    Every path identifies a node. The nodes that are the children of a node are the ones whose paths are one name longer in the lookup table. To list a node's children nodes is to list all the paths in the lookup table that are one name longer than its path. What the paper means by metatdata is info like whether and how a node is locked and for a non-directory leaf node where its (unshared) data is.

    One doesn't navigate by visiting directory nodes that own data that gives child and parent node names and whether they are directories, as in Unix/Linux. Copying a leaf means copying its data to another leaf's, like Unix/Linux cat, not cp. I presume one can copy a subtree, which would add new paths in the lookup table and copy data for non-directory leaves.

    One cannot use technical terms like "file" or "directory" as if they mean the same thing in both systems. What one can do is consider GFS and Unix/Linux to both manage the same kind of tree of paths of names through directory nodes to directory leaves and non-directory data-owning leaves. But after that the other parts of the file system state (metadata and data) and their operators differ. In your mind put "GFS-" and "Unix/Linux-" in front of every technical term other than those refering to trees of named nodes.

    EDIT: Examples.

    1.

    Suppose that some client of GFS wanted to scan the names of all files in a directory - for instance, ls. Without an iteration over all metadata nodes, how is this possible?

    A directory listing would return the paths in the lookup table that extend the given directory's path. GFS will offer file server commands to do such things or that support doing such things, hiding its implementation. It would be enough (but slow) to be able iterate through the lookup table. Eg ls /a/b:

    /a/b/foo
    /a/b/c/bar
    

    2. To copy source node children to be target node children: For each path that extends the source's path, add to the lookup table the path got by replacing that prefix by the target path. Presumably the copy command creating the new nodes copies associated data for non-directories. Eg copy children of /a/ to /a/b/c/ adds:

    /a/b/c/b/foo
    /a/b/c/b/c/bar
    /a/b/c/baz/
    

    giving:

    \
     a/--b/--foo
     |   \
     |    c/--bar
     |    |--b/--foo
     |    |  \
     |    |   c/--bar
     |    baz/
     baz/