I've been following the 5 min how to for setting up an htap databse with tidb_tispark and everything works until I get to the section Launch TiSpark. My first issue occurs when executing the line:
docker-compose exec tispark-master /opt/spark-2.1.1-bin-hadoop2.7/bin/spark-shell
But I got around that by modifying the spark version to the version I found inside the container:
docker-compose exec tispark-master /opt/spark-2.3.3-bin-hadoop2.7/bin/spark-shell
My second issue occurs when executing the three line block:
import org.apache.spark.sql.TiContext
val ti = new TiContext(spark)
ti.tidbMapDatabase("TPCH_001")
When I run the last statement I get the following output
scala> ti.tidbMapDatabase("TPCH_001")
2019-07-11 16:14:32 WARN General:96 - Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.3.3-bin-hadoop2.7/jars/datanucleus-core-3.2.10.jar."
2019-07-11 16:14:32 WARN General:96 - Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.3.3-bin-hadoop2.7/jars/datanucleus-api-jdo-3.2.6.jar."
2019-07-11 16:14:32 WARN General:96 - Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.3.3-bin-hadoop2.7/jars/datanucleus-rdbms-3.2.9.jar."
2019-07-11 16:14:36 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
This doesn't prevent me from running the query:
spark.sql("select * from nation").show(30);
But when I follow the further steps of the tutorial to modify the db from MySQL, the changes are not reflected immediately in Spark. Furthermore, at some point in the future (I believe > 5 minutes later), the row that was modified stops showing up in Spark SQL queries.
I'm rather new to this kind of setup and don't really know how to debug this issue. Searches for the warnings I received weren't illuminating.
I don't know if it's helpful but when I connect MySQL this is the server version I get:
Server version: 5.7.25-TiDB-v3.0.0-rc.1-309-g8c20289c7 MySQL Community Server (Apache License 2.0)
I'm one of the main dev of TiSpark. Sorry for your bad experience with it.
Due to my docker problem, I cannot directly reproduce your issue but it seems you hit one of the bug fixed recently. https://github.com/pingcap/tispark/pull/862/files
In newer version of TiSpark (TiSpark 2.0+ with Spark 2.3+), databases and tables are directly hooked into catalog services and you can directly call
spark.sql("use TPCH_001").show
spark.sql("select * from nation").show
This should give you fresh data. So try restart your Spark driver, just try the two lines of code above and see if it works.
Let me know if this fix your problem. On the other hand, we will check our docker image to make sure if it contains the fix already.
If things still get wrong, would you please help to run below code and let us know the version of TiSpark.
spark.sql("select ti_version()").show
Again, sorry for causing you trouble and thanks for trying.
EDIT
To address your comment: The warning is due to spark itself will try to locate the database in its native catalog first and this will cause a Failed to get warning. But the failover process will delegate the search to tispark and then behave correctly. So this warning can be ignored. It's recommended that add below lines to your log4j.properties in conf folder of your spark.
log4j.logger.org.apache.hadoop.hive.metastore.ObjectStore=ERROR
We will polish the docker tutorial image soon. Thank you so much for trying.