I have successfully compiled the JNI based Apache libhdfs (C++) on my Hadoop Sandbox / CentOS - no compilation errors or warnings:
g++ test.cpp -o test -I/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.151.x86_64/include/
-I/usr/hdp/2.6.3.0-235/usr/include/ -I/usr/hdp/2.6.3.0-235/hadoop/bin
-I/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el6_9.x86_64/include/
-I/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el6_9.x86_64/jre/lib/amd64/
-L/usr/hdp/2.6.3.0-235/hadoop/lib/ -L/usr/hdp/2.6.3.0-235/hadoop/lib/native
-L/usr/hdp/2.6.3.0-235/hadoop/lib/ -L/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el6_9.x86_64/jre/lib/amd64/
-L/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el6_9.x86_64/jre/lib/amd64/server/
-lhdfs -pthread -ljvm
Once I try to run the code, I get the following errors:
[root@sandbox-hdp ~]# ./test
Environment variable CLASSPATH not set!
getJNIEnv: getGlobalJNIEnv failed
Environment variable CLASSPATH not set!
getJNIEnv: getGlobalJNIEnv failed
If I run hadoop classpath
in the terminal, I get the following output:
[root@sandbox-hdp ~]# hadoop classpath
/usr/hdp/2.6.3.0-235/hadoop/conf:/usr/hdp/2.6.3.0-
235/hadoop/lib/:/usr/hdp/2.6.3.0-235/hadoop/.//:/usr/hdp/2.6.3.0-235/hadoop-
hdfs/./:/usr/hdp/2.6.3.0-235/hadoop-hdfs/lib/:/usr/hdp/2.6.3.0-235/hadoop-
hdfs/.//:/usr/hdp/2.6.3.0-235/hadoop-yarn/lib/:/usr/hdp/2.6.3.0-235/hadoop-
yarn/.//:/usr/hdp/2.6.3.0-235/hadoop-mapreduce/lib/:/usr/hdp/2.6.3.0-
235/hadoop-mapreduce/.//::jdbc-mysql.jar:mysql-connector-java-
5.1.17.jar:mysql-connector-java-5.1.37.jar:mysql-connector-
java.jar:/usr/hdp/2.6.3.0-235/tez/:/usr/hdp/2.6.3.0-
235/tez/lib/:/usr/hdp/2.6.3.0-235/tez/conf
On the Apache libhdfs page it says:
The most common problem is the CLASSPATH is not set properly when calling a program that uses libhdfs. Make sure you set it to all the Hadoop jars needed to run Hadoop itself as well as the right configuration directory containing hdfs-site.xml. It is not valid to use wildcard syntax for specifying multiple jars. It may be useful to run hadoop classpath --glob or hadoop classpath --jar to generate the correct classpath for your deployment. See Hadoop Commands Reference for more information on this command.
I do however not get how to proceed after many trial and error attempts, I would therefore appreciate any help that could help me to solve this problem.
Edit: tried the following: CLASSPATH=hadoop classpath
./test
...which gave me the following error: libjvm.so: cannot open shared object file: No such file or directory
I tried the following: export LD_LIBRARY_PATH=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el6_9.x86_64/jre/lib/amd64/server
...and now the error is:
[root@sandbox-hdp ~]# CLASSPATH=$CLASSPATH:`hadoop classpath` ./test
loadFileSystems error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
hdfsOpenFile(/tmp/testfile.txt): constructNewObjectOfPath error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
Maybe the following could works for you:
export CLASSPATH=$CLASSPATH:`hadoop classpath` ./test
or only this:
export CLASSPATH=`hadoop classpath` ./test
Check out JAVA_HOME
environment variable, maybe it could alter the java libraries used too.
And finally, a wrapper like the script below could be useful:
#!/bin/bash
export CLASSPATH="AllTheJARs"
ARG0="$0"
EXEC_PATH="$( dirname "$ARG0" )"
"${EXEC_PATH}/test" $@