scalahadoophdfsmkdirs

How can I prevent Hadoop's HDFS API from creating parent directories?


I want HDFS commands to fail if a parent directory doesn't exist when making subdirectories. When I use any of FileSystem#mkdirs, I find that an exception isn't risen, instead creating non-existent parent directories:

import java.util.UUID
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}

val conf = new Configuration()
conf.set("fs.defaultFS", s"hdfs://$host:$port")

val fileSystem = FileSystem.get(conf)
val cwd = fileSystem.getWorkingDirectory

// Guarantee non-existence by appending two UUIDs.
val dirToCreate = new Path(cwd, new Path(UUID.randomUUID.toString, UUID.randomUUID.toString))

fileSystem.mkdirs(dirToCreate)

Without the cumbersome burden of checking for the existence, how can I force HDFS to throw an exception if a parent directory doesn't exist?


Solution

  • The FileSystem API does not support this type of behavior. Instead, FileContext#mkdir should be used; for example:

    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs.{FileContext, FileSystem, Path}
    import org.apache.hadoop.fs.permission.FsPermission
    
    val files = FileContext.getFileContext()
    val cwd = files.getWorkingDirectory
    val permissions = new FsPermission("644")
    val createParent = false
    
    // Guarantee non-existence by appending two UUIDs.
    val dirToCreate = new Path(cwd, new Path(UUID.randomUUID.toString, UUID.randomUUID.toString))
    
    files.mkdir(dirToCreate, permissions, createParent)
    

    The above example will throw:

    java.io.FileNotFoundException: Parent directory doesn't exist: /user/erip/f425a2c9-1007-487b-8488-d73d447c6f79