My SPARK project (written in Java) requires to access (SELECT query results) different tables across executors.
One solution to this problem is :
DataFrame
to Map
.However, I have found that
Map
Map
of large size and passing it to executors as a broadcast variable doesn't sound efficient.Instead can we load tables in-memory using load
which can be shared across executors?
Is void org.apache.spark.sql.Dataset.createOrReplaceTempView(String viewName)
or void org.apache.spark.sql.Dataset.createGlobalTempView(String viewName) throws AnalysisException
Method useful for this purpose?
SPARK VERSION : 2.3.0
You can broadcast a DataFrame. See documentation