I am trying to set up Tachyon on the S3 filesystem. I am completely new to Tachyon and am still really reading what I can find on it. My tachyon-env.sh is given below:
!/usr/bin/env bash
# This file contains environment variables required to run Tachyon. Copy it as tachyon-env.sh and
# edit that to configure Tachyon for your site. At a minimum,
# the following variables should be set:
#
# - JAVA_HOME, to point to your JAVA installation
# - TACHYON_MASTER_ADDRESS, to bind the master to a different IP address or hostname
# - TACHYON_UNDERFS_ADDRESS, to set the under filesystem address.
# - TACHYON_WORKER_MEMORY_SIZE, to set how much memory to use (e.g. 1000mb, 2gb) per worker
# - TACHYON_RAM_FOLDER, to set where worker stores in memory data
# - TACHYON_UNDERFS_HDFS_IMPL, to set which HDFS implementation to use (e.g. com.mapr.fs.MapRFileSystem,
# org.apache.hadoop.hdfs.DistributedFileSystem)
# The following gives an example:
if [[ `uname -a` == Darwin* ]]; then
# Assuming Mac OS X
export JAVA_HOME=${JAVA_HOME:-$(/usr/libexec/java_home)}
export TACHYON_RAM_FOLDER=/Volumes/ramdisk
export TACHYON_JAVA_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc="
else
# Assuming Linux
if [ -z "$JAVA_HOME" ]; then
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
fi
export TACHYON_RAM_FOLDER=/mnt/ramdisk
fi
export JAVA="$JAVA_HOME/bin/java"
export TACHYON_MASTER_ADDRESS=localhost
export TACHYON_UNDERFS_ADDRESS=s3n://test
#export TACHYON_UNDERFS_ADDRESS=hdfs://localhost:9000
export TACHYON_WORKER_MEMORY_SIZE=0.5GB
export TACHYON_UNDERFS_HDFS_IMPL=org.apache.hadoop.hdfs.DistributedFileSystem
CONF_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
export TACHYON_JAVA_OPTS+="
-Dlog4j.configuration=file:$CONF_DIR/log4j.properties
-Dtachyon.debug=false
-Dtachyon.underfs.address=$TACHYON_UNDERFS_ADDRESS
-Dtachyon.underfs.hdfs.impl=$TACHYON_UNDERFS_HDFS_IMPL
-Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data
-Dtachyon.workers.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/workers
-Dtachyon.worker.memory.size=$TACHYON_WORKER_MEMORY_SIZE
-Dtachyon.worker.data.folder=$TACHYON_RAM_FOLDER/tachyonworker/
-Dtachyon.master.worker.timeout.ms=60000
-Dtachyon.master.hostname=$TACHYON_MASTER_ADDRESS
-Dtachyon.master.journal.folder=$TACHYON_HOME/journal/
-Dorg.apache.jasper.compiler.disablejsr199=true
-Djava.net.preferIPv4Stack=true
-Dfs.s3n.awsAccessKeyId=123
-Dfs.s3n.awsSecretAccessKey=456
"
# Master specific parameters. Default to TACHYON_JAVA_OPTS.
export TACHYON_MASTER_JAVA_OPTS="$TACHYON_JAVA_OPTS"
# Worker specific parameters that will be shared to all workers. Default to TACHYON_JAVA_OPTS.
export TACHYON_WORKER_JAVA_OPTS="$TACHYON_JAVA_OPTS"
However, when I am trying to format Tachyon, I am getting the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:224)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:214)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at tachyon.UnderFileSystemHdfs.<init>(UnderFileSystemHdfs.java:89)
at tachyon.UnderFileSystemHdfs.getClient(UnderFileSystemHdfs.java:56)
at tachyon.UnderFileSystem.get(UnderFileSystem.java:69)
at tachyon.UnderFileSystem.get(UnderFileSystem.java:54)
at tachyon.Format.formatFolder(Format.java:32)
at tachyon.Format.main(Format.java:59)
Caused by: java.lang.ClassNotFoundException: org.jets3t.service.S3ServiceException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 13 more
Should I change my jets3t jar file, or is it something else? The question may be really basic, but that is exactly my level right now. I ran some basic tests with Tachyon, though.
I would be glad for any help!!
The problem is that accessing to S3 requires more dependencies than package by default. For tachyon 0.5.0 to work with hadoop 1.0.4, you'll have to export first paths to : * jets3:0.7.1 * commons-httpclient:3.1
You can do it likewise (hijacking the TACHYON_CLASSPATH
var):
export TACHYON_CLASSPATH=~/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:~/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar
Then, you have to use this variable also in the start script, everywhere TACHYON_JAR
is used as -cp
, prepend TACHYON_CLASSPATH
.
Example:
(nohup $JAVA -cp $TACHYON_JAR -Dtachyon.home=$TACHYON_HOME -Dtachyon.logger.type="MASTER_LOGGER" -Dlog4j.configuration=file:$TACHYON_CONF_DIR/log4j.properties $TACHYON_MASTER_JAVA_OPTS tachyon.master.TachyonMaster > /dev/null 2>&1) &
becomes
(nohup $JAVA -cp $TACHYON_CLASSPATH:$TACHYON_JAR -Dtachyon.home=$TACHYON_HOME -Dtachyon.logger.type="MASTER_LOGGER" -Dlog4j.configuration=file:$TACHYON_CONF_DIR/log4j.properties $TACHYON_MASTER_JAVA_OPTS tachyon.master.TachyonMaster > /dev/null 2>&1) &
Finally, you can format your s3 bucket and start tachyon:
./bin/tachyon format
./bin/tachyon-start local
WARN about the scheme s3n
: I've looked deeper in the code and found some weird stuffs regarding s3 credentials. Apparently, only s3n
scheme will work (because only this flavor will be injected/copied in the conf).
Hence, the UNDERFS url to s3 must use the s3n
scheme, and you java properties in the tachyon-env.sh
as well.
Luckily, you're already okay, but others might not.