I've tried this with all combinations of:
and I get the same error. Here I will show the Scala + JDK 11 + Spark 3.3.1 attempt, but as I said, all combinations result in the same error:
export JAVA_HOME=$(/usr/libexec/java_home -v 11)
export SPARK_HOME=~/opt/spark/spark-3.3.1-bin-hadoop3-scala2.13
$SPARK_HOME/bin/spark-shell \
-c spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS \
-c spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem \
--packages "com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.28.0,com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.10"
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val df = spark.createDataFrame(
java.util.List.of(
Row(1, "foo"),
Row(2, "bar")
), StructType(
StructField("a", IntegerType) ::
StructField("b", StringType) ::
Nil))
df.show()
That results in:
+---+---+
| a| b|
+---+---+
| 1|foo|
| 2|bar|
+---+---+
df.write.
format("bigquery").
mode("overwrite").
option("project", "<redacted>").
option("parentProject", "<redacted>").
option("dataset", "<redacted>").
option("credentials", bigquery_credentials_b64).
option("temporaryGcsBucket", "<redacted>").
save("test_table")
I get:
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3467)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at com.google.cloud.spark.bigquery.SparkBigQueryUtil.getUniqueGcsPath(SparkBigQueryUtil.java:127)
at com.google.cloud.spark.bigquery.SparkBigQueryUtil.createGcsPath(SparkBigQueryUtil.java:108)
... 75 elided
Caused by: java.lang.reflect.InvocationTargetException: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
com/google/api/ClientProto.registerAllExtensions(Lcom/google/protobuf/ExtensionRegistryLite;)V @4: invokevirtual
Reason:
Type 'com/google/protobuf/GeneratedMessage$GeneratedExtension' (current frame, stack[1]) is not assignable to 'com/google/protobuf/ExtensionLite'
Current Frame:
bci: @4
flags: { }
locals: { 'com/google/protobuf/ExtensionRegistryLite' }
stack: { 'com/google/protobuf/ExtensionRegistryLite', 'com/google/protobuf/GeneratedMessage$GeneratedExtension' }
Bytecode:
0000000: 2ab2 0002 b600 032a b200 04b6 0003 2ab2
0000010: 0005 b600 03b1
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:135)
... 83 more
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
com/google/api/ClientProto.registerAllExtensions(Lcom/google/protobuf/ExtensionRegistryLite;)V @4: invokevirtual
Reason:
Type 'com/google/protobuf/GeneratedMessage$GeneratedExtension' (current frame, stack[1]) is not assignable to 'com/google/protobuf/ExtensionLite'
Current Frame:
bci: @4
flags: { }
locals: { 'com/google/protobuf/ExtensionRegistryLite' }
stack: { 'com/google/protobuf/ExtensionRegistryLite', 'com/google/protobuf/GeneratedMessage$GeneratedExtension' }
Bytecode:
0000000: 2ab2 0002 b600 032a b200 04b6 0003 2ab2
0000010: 0005 b600 03b1
... 5 elided and 88 more
The solution is that custom shaded .jars are required. Managed Spark environments like Databricks and Amazon EMR have solved these issues, but this is actually quite complex to get running in a local environment with spark-shell.