hadoopbigtop

Namenode appears to hang on start


Version is 2.0.4

I'm using the Cloudera bigtop debian packages from here http://bigtop01.cloudera.org:8080/job/Bigtop-0.6.0-rvs/ws/, but I actually encounter the same issue getting the latest jars from Apache mirrors and configuring myself.

Also, for what it's worth, I found a debian package for 1.1.2 from Apache and do not have this issue.

So starting the namenode, I get this and then it freezes.

$ sudo -u hdfs hdfs namenode
13/06/26 03:47:44 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = beagle4/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.0.4-SNAPSHOT
STARTUP_MSG:   classpath = /etc/hadoop/conf:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/zookeeper-3.4.2.jar:/usr/lib/hadoop/lib/avro-1.5.3.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop/.//hadoop-common-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop/.//hadoop-common-2.0.4-SNAPSHOT-tests.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.4-SNAPSHOT.jar:/contrib/capacity-scheduler/*.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.4-SNAPSHOT-tests.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/netty-3.5.11.Final.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/avro-1.5.3.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/junit-4.8.2.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-client-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-unmanaged-am-launcher-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-mapreduce/lib/javax.inject-1.jar:/usr/lib/hadoop-mapreduce/lib/netty-3.5.11.Final.jar:/usr/lib/hadoop-mapreduce/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-mapreduce/lib/avro-1.5.3.jar:/usr/lib/hadoop-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-mapreduce/lib/guice-3.0.jar:/usr/lib/hadoop-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-mapreduce/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-mapreduce/lib/aopalliance-1.0.jar:/usr/lib/hadoop-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-mapreduce/.//hadoop-rumen-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-core-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-extras-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-distcp-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-datajoin-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient-2.0.4-SNAPSHOT-tests.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-jobclient-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-app-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-plugins-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-streaming-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-gridmix-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-common-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-examples-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-archives-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-shuffle-2.0.4-SNAPSHOT.jar:/usr/lib/hadoop-mapreduce/.//hadoop-mapreduce-client-hs-2.0.4-SNAPSHOT.jar
STARTUP_MSG:   build = git://github.com/rvs/bigtop.git -r 9be2ff0b8cc3f0ba0c45e5c917aa628aca5259be; compiled by 'jenkins' on 2013-03-26T01:23Z
STARTUP_MSG:   java = 1.7.0_21
************************************************************/
13/06/26 03:47:49 DEBUG impl.MetricsSystemImpl: from system property: null
13/06/26 03:47:49 DEBUG impl.MetricsSystemImpl: from environment variable: null

Here are the parameters being passed to the namenode.

-Djava.net.preferIPv4Stack=true 
-Dhadoop.log.dir=/usr/lib/hadoop/logs 
-Dhadoop.log.file=hadoop.log 
-Dhadoop.home.dir=/usr/lib/hadoop 
-Dhadoop.id.str=hdfs 
-Dhadoop.root.logger=DEBUG,console 
-Djava.library.path=/usr/lib/hadoop/lib/native 
-Dhadoop.policy.file=hadoop-policy.xml 
-Djava.net.preferIPv4Stack=true 
-Dhadoop.security.logger=INFO,RFAS 
-Dhdfs.audit.logger=INFO,NullAppender 
-Dhadoop.security.logger=INFO,NullAppender

It's annoying because it's not failing, it is just hanging... any ideas on how I should debug this?

EDIT : Ok, I'm stumped. If I add the java opts to listen for a Java debugger, it works? I don't understand...

# works?
exec "$JAVA" -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=1044 -Dproc_$COMMAND $JAVA_HEAP_MAX $HA$

# doesn't work
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"

EDIT 2: jstack output - starting to home in on the issue

Full thread dump OpenJDK Zero VM (22.0-b10 mixed mode):

"Service Thread" daemon prio=10 tid=0xb5fd7460 nid=0xc7c runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0xb5fd5978 nid=0xc7b waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0xb5f935b8 nid=0xc7a in Object.wait() [0x70d7e000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x86413228> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
    - locked <0x86413228> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

"Reference Handler" daemon prio=10 tid=0xb5f91c08 nid=0xc79 in Object.wait() [0x70efe000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x864132b0> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:503)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
    - locked <0x864132b0> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0xb5f073e0 nid=0xc77 runnable [0xb6210000]
   java.lang.Thread.State: RUNNABLE
    at com.google.common.base.CharMatcher$12.setBits(CharMatcher.java:507)
    at com.google.common.base.CharMatcher$Or.setBits(CharMatcher.java:642)
    at com.google.common.base.CharMatcher.precomputedInternal(CharMatcher.java:673)
    at com.google.common.base.Platform.precomputeCharMatcher(Platform.java:53)
    at com.google.common.base.CharMatcher.precomputed(CharMatcher.java:657)
    at com.google.common.base.CharMatcher.<clinit>(CharMatcher.java:179)
    at com.google.common.base.Splitter.on(Splitter.java:125)
    at org.apache.hadoop.metrics2.impl.MetricsConfig.<clinit>(MetricsConfig.java:84)
    at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:446)
    - locked <0x718606c8> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
    at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:180)
    - locked <0x718606c8> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
    at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:156)
    - locked <0x718606c8> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
    at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54)
    at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1168)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1235)

"VM Thread" prio=10 tid=0xb5f8b6b0 nid=0xc78 runnable 

"VM Periodic Task Thread" prio=10 tid=0xb5fd90b0 nid=0xc7d waiting on condition 

JNI global references: 137

Heap
 def new generation   total 2304K, used 1789K [0x71660000, 0x718e0000, 0x863a0000)
  eden space 2048K,  74% used [0x71660000, 0x717df758, 0x71860000)
  from space 256K, 100% used [0x71860000, 0x718a0000, 0x718a0000)
  to   space 256K,   0% used [0x718a0000, 0x718a0000, 0x718e0000)
 tenured generation   total 5376K, used 2333K [0x863a0000, 0x868e0000, 0xafe60000)
   the space 5376K,  43% used [0x863a0000, 0x865e76f8, 0x865e7800, 0x868e0000)
 compacting perm gen  total 5632K, used 5487K [0xafe60000, 0xb03e0000, 0xb3e60000)
   the space 5632K,  97% used [0xafe60000, 0xb03bbf28, 0xb03bc000, 0xb03e0000)
No shared spaces configured.

Solution

  • The problem lies in this jar.

    guava-11.0.2.jar
    

    Using jstack I was able to trace the lock back to

    com.google.common.base.CharMatcher$12.setBits(CharMatcher.java:507)
    

    I don't know why it causes my namenode/datanode services to hang, but the solution was to download the latest version (currently 14.0.1) and all of my problems went away.