scalahadoophdfssafe-mode

Why is my test cluster running in safe mode?


I'm testing some basic HDFS operations like creating directories. I have the following cluster configuration in my test:

import org.apache.hadoop.fs._
import org.apache.hadoop.fs.permission.FsPermission
import org.apache.hadoop.hdfs.{HdfsConfiguration, MiniDFSCluster}

// ...

private val baseDir = new File("./target/hdfs/test").getAbsoluteFile

private val conf = new HdfsConfiguration()
conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, baseDir.getAbsolutePath)
conf.setInt("dfs.safemode.threshold.pct", 0)
private val builder = new MiniDFSCluster.Builder(conf)
private val cluster = builder.build()
cluster.waitActive()
private val fs = cluster.getFileSystem

private val host = cluster.getNameNode.getHttpAddress.getHostString
private val port = cluster.getNameNodePort

I find that when I run the tests, I find that I always get this error:

[warn] o.a.h.s.UserGroupInformation - PriviledgedActionException as:erip (auth:SIMPLE) cause:org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory [...]. Name node is in safe mode.
Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE:  If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.

followed soon after by ...

[info]   org.apache.hadoop.ipc.RemoteException: Cannot create directory [...]. Name node is in safe mode.
[info] Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE:  If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.

I'm running an in-memory cluster, so I don't know why I'm seeing this. I thought setting "dfs.safemode.threshold.pct" would prevent me from seeing this error based on this answer, but I was mistaken.

Why is my in-memory test cluster running in safe mode? How do I stop it from doing this?


Solution

  • The problem was with cluster.waitActive(), which waits for the name nodes to be ready. This should have been cluster.waitClusterUp(), which explicitly waits for the cluster to come out of safe mode.