If I run DBI::dbGetQuery(sc, "SHOW DATABASES")
in R, I get as result only default
database.
And not the full list of hive tables created from the hive>
command line...
Also in the R project dir, get's created a derby.log and metastore_db folder.
So my guess is that sparklyr's spark session is no using the global hive config...
I'm using Spark 3.3.0, Sparklyr 1.7.8 and MySQL for metastore...
I have tried changing sql.warehouse.dir
to the value of hive's hive.metastore.warehouse.dir
which is "/user/hive/warehouse"
and sql.catalogImplementation
to "hive"
.
options(sparklyr.log.console = TRUE)
sc_config <- spark_config()
sc_config$spark.sql.warehouse.dir <- "/user/hive/warehouse"
sc_config$spark.sql.catalogImplementation <- "hive"
sc <- spark_connect(master = "yarn", spark_home = "/home/ml/spark", app_name = "TestAPP", config = sc_config)
sparklyr::hive_context_config(sc)
This is the log from > sparklyr.log.console = TRUE
:
22/10/18 11:11:43 INFO sparklyr: Session (97754) is starting under 127.0.0.1 port 8880
22/10/18 11:11:43 INFO sparklyr: Session (97754) found port 8880 is available
22/10/18 11:11:43 INFO sparklyr: Gateway (97754) is waiting for sparklyr client to connect to port 8880
22/10/18 11:11:43 INFO sparklyr: Gateway (97754) accepted connection
22/10/18 11:11:43 INFO sparklyr: Gateway (97754) is waiting for sparklyr client to connect to port 8880
22/10/18 11:11:43 INFO sparklyr: Gateway (97754) received command 0
22/10/18 11:11:43 INFO sparklyr: Gateway (97754) found requested session matches current session
22/10/18 11:11:43 INFO sparklyr: Gateway (97754) is creating backend and allocating system resources
22/10/18 11:11:43 INFO sparklyr: Gateway (97754) is using port 8881 for backend channel
22/10/18 11:11:44 INFO sparklyr: Gateway (97754) created the backend
22/10/18 11:11:44 INFO sparklyr: Gateway (97754) is waiting for R process to end
22/10/18 11:11:46 INFO HiveConf: Found configuration file null
22/10/18 11:11:46 INFO SparkContext: Running Spark version 3.3.0
22/10/18 11:11:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/10/18 11:11:47 INFO ResourceUtils: ==============================================================
22/10/18 11:11:47 INFO ResourceUtils: No custom resources configured for spark.driver.
22/10/18 11:11:47 INFO ResourceUtils: ==============================================================
22/10/18 11:11:47 INFO SparkContext: Submitted application: TestAPP
22/10/18 11:11:47 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 512, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
22/10/18 11:11:47 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor
22/10/18 11:11:47 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/10/18 11:11:48 INFO SecurityManager: Changing view acls to: ml
22/10/18 11:11:48 INFO SecurityManager: Changing modify acls to: ml
22/10/18 11:11:48 INFO SecurityManager: Changing view acls groups to:
22/10/18 11:11:48 INFO SecurityManager: Changing modify acls groups to:
22/10/18 11:11:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ml); groups with view permissions: Set(); users with modify permissions: Set(ml); groups with modify permissions: Set()
22/10/18 11:11:48 INFO Utils: Successfully started service 'sparkDriver' on port 38889.
22/10/18 11:11:48 INFO SparkEnv: Registering MapOutputTracker
22/10/18 11:11:48 INFO SparkEnv: Registering BlockManagerMaster
22/10/18 11:11:48 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/10/18 11:11:48 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/10/18 11:11:48 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/10/18 11:11:49 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-65ec8b4e-6131-4fed-a227-ea5b2162e4d8
22/10/18 11:11:49 INFO MemoryStore: MemoryStore started with capacity 93.3 MiB
22/10/18 11:11:49 INFO SparkEnv: Registering OutputCommitCoordinator
22/10/18 11:11:50 INFO Utils: Successfully started service 'SparkUI' on port 4040.
22/10/18 11:11:50 INFO SparkContext: Added JAR file:/home/ml/R/x86_64-pc-linux-gnu-library/4.2/sparklyr/java/sparklyr-master-2.12.jar at spark://master:38889/jars/sparklyr-master-2.12.jar with timestamp 1666116706621
22/10/18 11:11:51 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
22/10/18 11:11:53 INFO Configuration: resource-types.xml not found
22/10/18 11:11:53 INFO ResourceUtils: Unable to find 'resource-types.xml'.
22/10/18 11:11:53 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
22/10/18 11:11:53 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
22/10/18 11:11:53 INFO Client: Setting up container launch context for our AM
22/10/18 11:11:53 INFO Client: Setting up the launch environment for our AM container
22/10/18 11:11:53 INFO Client: Preparing resources for our AM container
22/10/18 11:11:53 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
22/10/18 11:12:03 INFO Client: Uploading resource file:/tmp/spark-71575ad6-a8f7-43c0-974e-7c751281ef51/__spark_libs__890394313143327111.zip -> file:/home/ml/.sparkStaging/application_1665674177007_0028/__spark_libs__890394313143327111.zip
22/10/18 11:12:07 INFO Client: Uploading resource file:/tmp/spark-71575ad6-a8f7-43c0-974e-7c751281ef51/__spark_conf__9152665720324853254.zip -> file:/home/ml/.sparkStaging/application_1665674177007_0028/__spark_conf__.zip
22/10/18 11:12:08 INFO SecurityManager: Changing view acls to: ml
22/10/18 11:12:08 INFO SecurityManager: Changing modify acls to: ml
22/10/18 11:12:08 INFO SecurityManager: Changing view acls groups to:
22/10/18 11:12:08 INFO SecurityManager: Changing modify acls groups to:
22/10/18 11:12:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ml); groups with view permissions: Set(); users with modify permissions: Set(ml); groups with modify permissions: Set()
22/10/18 11:12:08 INFO Client: Submitting application application_1665674177007_0028 to ResourceManager
22/10/18 11:12:08 INFO YarnClientImpl: Submitted application application_1665674177007_0028
22/10/18 11:12:09 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:09 INFO Client:
client token: N/A
diagnostics: [Tue Oct 18 11:12:08 -0700 2022] Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:16384, vCores:16> ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; Queue's capacity (absolute resource) = <memory:16384, vCores:16> ; Queue's used capacity (absolute resource) = <memory:0, vCores:0> ; Queue's max capacity (absolute resource) = <memory:16384, vCores:16> ;
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1666116728172
final status: UNDEFINED
tracking URL: http://master:8088/proxy/application_1665674177007_0028/
user: ml
22/10/18 11:12:10 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:11 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:12 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:13 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:14 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:15 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:16 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:17 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:18 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:19 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:20 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:21 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:22 INFO Client: Application report for application_1665674177007_0028 (state: ACCEPTED)
22/10/18 11:12:23 INFO Client: Application report for application_1665674177007_0028 (state: RUNNING)
22/10/18 11:12:23 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 192.168.1.82
ApplicationMaster RPC port: -1
queue: default
start time: 1666116728172
final status: UNDEFINED
tracking URL: http://master:8088/proxy/application_1665674177007_0028/
user: ml
22/10/18 11:12:23 INFO YarnClientSchedulerBackend: Application application_1665674177007_0028 has started running.
22/10/18 11:12:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43035.
22/10/18 11:12:23 INFO NettyBlockTransferService: Server created on master:43035
22/10/18 11:12:23 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/10/18 11:12:23 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, master, 43035, None)
22/10/18 11:12:23 INFO BlockManagerMasterEndpoint: Registering block manager master:43035 with 93.3 MiB RAM, BlockManagerId(driver, master, 43035, None)
22/10/18 11:12:23 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, master, 43035, None)
22/10/18 11:12:23 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, master, 43035, None)
22/10/18 11:12:23 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> master, PROXY_URI_BASES -> http://master:8088/proxy/application_1665674177007_0028), /proxy/application_1665674177007_0028
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /jobs: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /jobs/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /jobs/job: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /jobs/job/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /stages: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /stages/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /stages/stage: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /stages/stage/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /stages/pool: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /stages/pool/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /storage: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /storage/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /storage/rdd: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /storage/rdd/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /environment: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /environment/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /executors: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /executors/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /executors/threadDump: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /executors/threadDump/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:24 INFO ServerInfo: Adding filter to /static: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO ServerInfo: Adding filter to /: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO ServerInfo: Adding filter to /api: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO ServerInfo: Adding filter to /jobs/job/kill: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO ServerInfo: Adding filter to /stages/stage/kill: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO ServerInfo: Adding filter to /metrics/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000000000(ns)
22/10/18 11:12:25 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
22/10/18 11:12:25 INFO SharedState: Warehouse path is 'file:/user/hive/warehouse'.
22/10/18 11:12:25 INFO ServerInfo: Adding filter to /SQL: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO ServerInfo: Adding filter to /SQL/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO ServerInfo: Adding filter to /SQL/execution: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO ServerInfo: Adding filter to /SQL/execution/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO ServerInfo: Adding filter to /static/sql: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
22/10/18 11:12:25 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
22/10/18 11:12:29 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 1 for reason Container from a bad node: container_1665674177007_0028_02_000002 on host: worker1. Exit status: -1000. Diagnostics: [2022-10-18 11:12:26.949]File file:/home/ml/.sparkStaging/application_1665674177007_0028/__spark_libs__890394313143327111.zip does not exist
java.io.FileNotFoundException: File file:/home/ml/.sparkStaging/application_1665674177007_0028/__spark_libs__890394313143327111.zip does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:271)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:68)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:415)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:412)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:412)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:247)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:240)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:228)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
.
22/10/18 11:12:29 INFO BlockManagerMaster: Removal of executor 1 requested
22/10/18 11:12:29 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 1
22/10/18 11:12:29 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
22/10/18 11:12:39 INFO HiveUtils: Initializing HiveMetastoreConnection version 2.3.9 using Spark classes.
22/10/18 11:12:40 INFO HiveClientImpl: Warehouse location for Hive client (version 2.3.9) is file:/user/hive/warehouse
22/10/18 11:12:41 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.1.82:43560) with ID 2, ResourceProfileId 0
22/10/18 11:12:42 INFO BlockManagerMasterEndpoint: Registering block manager master:40397 with 93.3 MiB RAM, BlockManagerId(2, master, 40397, None)
22/10/18 11:12:49 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.1.82:43600) with ID 3, ResourceProfileId 0
22/10/18 11:12:50 INFO BlockManagerMasterEndpoint: Registering block manager master:44035 with 93.3 MiB RAM, BlockManagerId(3, master, 44035, None)
And this is the print from > sparklyr::hive_context_config(sc)
: https://pastebin.com/e28KJ4wQ
Any help? Thanks in advance.
Okay so I found the solution on this other question.
I added this property to my hive-site.xml
and also copied it to $HOME_SPARK/conf/
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
Also I removed all spark_config()
configs that I tried before.
I would love to know why was this the solution.