I have an Apache Spark Cluster(2.2.0) in standalone mode. Till now was running using HDFS to store the parquet files. I'm using the Hive Metastore Service of Apache Hive 1.2 to access, using the Thriftserver, Spark over JDBC.
Now I want to use S3 Object Storage instead HDFS. I have added the following configuration to my hive-site.xml:
<property>
<name>fs.s3a.access.key</name>
<value>access_key</value>
<description>Profitbricks Access Key</description>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>secret_key</value>
<description>Profitbricks Secret Key</description>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3-de-central.profitbricks.com</value>
<description>ProfitBricks S3 Object Storage Endpoint</description>
</property>
<property>
<name>fs.s3a.endpoint.http.port</name>
<value>80</value>
<description>ProfitBricks S3 Object Storage Endpoint HTTP Port</description>
</property>
<property>
<name>fs.s3a.endpoint.https.port</name>
<value>443</value>
<description>ProfitBricks S3 Object Storage Endpoint HTTPS Port</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>s3a://dev.spark.my_bucket/parquet/</value>
<description>Profitbricks S3 Object Storage Hive Warehouse Location</description>
</property>
I have the hive metastore in a MySQL 5.7 database. I have added to the Hive lib folder the following jar files:
I have deleted the old hive metastore schema on MySQL and then I start the metastore service with the following command: hive --service metastore &
and I get the following error:
java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/ObjectMapper
at com.amazonaws.util.json.Jackson.<clinit>(Jackson.java:27)
at com.amazonaws.internal.config.InternalConfig.loadfrom(InternalConfig.java:182)
at com.amazonaws.internal.config.InternalConfig.load(InternalConfig.java:199)
at com.amazonaws.internal.config.InternalConfig$Factory.<clinit>(InternalConfig.java:232)
at com.amazonaws.ServiceNameFactory.getServiceName(ServiceNameFactory.java:34)
at com.amazonaws.AmazonWebServiceClient.computeServiceName(AmazonWebServiceClient.java:703)
at com.amazonaws.AmazonWebServiceClient.getServiceNameIntern(AmazonWebServiceClient.java:676)
at com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:278)
at com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
at com.amazonaws.services.s3.AmazonS3Client.setEndpoint(AmazonS3Client.java:475)
at com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:447)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:391)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:371)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.ObjectMapper
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
The missing class belongs to the Jackson library, then I have copied the Jackson-*.jar located on my spark-2.2.0-bin-hadoop2.7/jars/ folder which are:
But then I got the following error:
2018-01-05 17:51:00,819 ERROR [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5920)) - Metastore Thrift Server threw an exception...
java.lang.NumberFormatException: For input string: "100M"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1319)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
I think the error here it have something to do with some jar version incompatibility but I'm not able to find the correct versions.
Can someone help me here?
This one is coming in because the value of fs.s3a.multipart.size
set in hadoop-common's /core-default.xml
resource is 100M, which came in with HADOOP-13680 and the range parsing handling numbers like "100M" instead of 104857600 . This stack trace says "Hadoop 2.8+ configuration"
You could try setting the property in your configs to that numeric value, but its a warning sign that versions of JARs are out of sync and you will probably only get a few lines further before something else breaks.
Fix: make sure that hadoop-common.jar
and hadoop-aws.jar
are in sync. It looks like you've got the jackson and aws ones lined up, though jackson is complex enough you can never take that for granted.