Some details:
ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads
BI tool is connect via odbc driver
After activating Spark Thrift Server I'm unable to run pyspark script using spark-submit as they both use the same metastore_db
error:
Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3acaa384, see the next exception for details.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source)
... 140 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /tmp/metastore_db.
I need to be able to run PySpark (Spark ETL) while having spark thrift server up for BI tool queries. Any workaround for it?
Thanks!
In my case the solution was to move the metastore_db to a database server like MySql (in my case) or Postgresql.
You will have to configure $SPARK_HOME/conf/hive-site.xml and include your jdbc driver in $SPARK_HOME/jars path
hive-site.xml example for MySQL connection
<configuration>
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://XXX.XXX.XXX.XXX:3306/metastore?createDatabaseIfNotExist=true&useSSL=FALSE&autoReconnect=true&nullCatalogMeansCurrent=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>YOUR_USER</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>YOUR_PASSWORD</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>http</value>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.http.endpoint</name>
<value>cliservice</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description/>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>true</value>
</property>
</configuration>