mysqlapache-sparkamazon-s3hivemetastore

Hive 1.2 Metastore Service doesn't start after configuring it to S3 storage instead HDFS


I have an Apache Spark Cluster(2.2.0) in standalone mode. Till now was running using HDFS to store the parquet files. I'm using the Hive Metastore Service of Apache Hive 1.2 to access, using the Thriftserver, Spark over JDBC.

Now I want to use S3 Object Storage instead HDFS. I have added the following configuration to my hive-site.xml:

<property>
  <name>fs.s3a.access.key</name>
  <value>access_key</value>
  <description>Profitbricks Access Key</description>
</property>
<property>
  <name>fs.s3a.secret.key</name>
  <value>secret_key</value>
  <description>Profitbricks Secret Key</description>
</property>
<property>
  <name>fs.s3a.endpoint</name>
  <value>s3-de-central.profitbricks.com</value>
  <description>ProfitBricks S3 Object Storage Endpoint</description>
</property>
<property>
  <name>fs.s3a.endpoint.http.port</name>
  <value>80</value>
  <description>ProfitBricks S3 Object Storage Endpoint HTTP Port</description>
</property>
<property>
  <name>fs.s3a.endpoint.https.port</name>
  <value>443</value>
  <description>ProfitBricks S3 Object Storage Endpoint HTTPS Port</description>
</property>
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>s3a://dev.spark.my_bucket/parquet/</value>
  <description>Profitbricks S3 Object Storage Hive Warehouse Location</description>
</property>

I have the hive metastore in a MySQL 5.7 database. I have added to the Hive lib folder the following jar files:

I have deleted the old hive metastore schema on MySQL and then I start the metastore service with the following command: hive --service metastore & and I get the following error:

java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/ObjectMapper
        at com.amazonaws.util.json.Jackson.<clinit>(Jackson.java:27)
        at com.amazonaws.internal.config.InternalConfig.loadfrom(InternalConfig.java:182)
        at com.amazonaws.internal.config.InternalConfig.load(InternalConfig.java:199)
        at com.amazonaws.internal.config.InternalConfig$Factory.<clinit>(InternalConfig.java:232)
        at com.amazonaws.ServiceNameFactory.getServiceName(ServiceNameFactory.java:34)
        at com.amazonaws.AmazonWebServiceClient.computeServiceName(AmazonWebServiceClient.java:703)
        at com.amazonaws.AmazonWebServiceClient.getServiceNameIntern(AmazonWebServiceClient.java:676)
        at com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:278)
        at com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
        at com.amazonaws.services.s3.AmazonS3Client.setEndpoint(AmazonS3Client.java:475)
        at com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:447)
        at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:391)
        at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:371)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
        at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
        at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
        at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.ObjectMapper
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

The missing class belongs to the Jackson library, then I have copied the Jackson-*.jar located on my spark-2.2.0-bin-hadoop2.7/jars/ folder which are:

But then I got the following error:

2018-01-05 17:51:00,819 ERROR [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5920)) - Metastore Thrift Server threw an exception...
java.lang.NumberFormatException: For input string: "100M"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1319)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
        at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
        at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
        at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

I think the error here it have something to do with some jar version incompatibility but I'm not able to find the correct versions.

Can someone help me here?


Solution

    1. You absolutely cannot mix versions of the Hadoop-common, hadoop-aws, aws-s3-sdk and jackson versions from what everything expects, or you will see stack traces.
    2. And its all open source, so if you D/L all the source JARs locally, your IDE will help you find what's causing the stack trace. This is what we all do. It's not magic, modern IDEs (intellij IDEA) even have special stack debugging.

    This one is coming in because the value of fs.s3a.multipart.size set in hadoop-common's /core-default.xml resource is 100M, which came in with HADOOP-13680 and the range parsing handling numbers like "100M" instead of 104857600 . This stack trace says "Hadoop 2.8+ configuration"

    You could try setting the property in your configs to that numeric value, but its a warning sign that versions of JARs are out of sync and you will probably only get a few lines further before something else breaks.

    Fix: make sure that hadoop-common.jar and hadoop-aws.jar are in sync. It looks like you've got the jackson and aws ones lined up, though jackson is complex enough you can never take that for granted.