I am trying to run a Spark Application to write and read data to Cloud Bigtable from Dataproc.
Initially, I got this exception java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument
. Then came to know that there are some dependency issues from this Google Documentation - [Manage Java and Scala dependencies for Apache Spark][1].
Following the instructions, I changed my build.sbt
file to shade the jars -
assembly / assemblyShadeRules := Seq(
ShadeRule.rename("com.google.common.**" -> "repackaged.com.google.common.@1").inAll,
ShadeRule.rename("com.google.protobuf.**" -> "repackaged.com.google.protobuf.@1").inAll,
ShadeRule.rename("io.grpc.**" -> "repackaged.io.grpc.@1").inAll
)
Then got this error
repackaged.io.grpc.ManagedChannelProvider$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact
at repackaged.io.grpc.ManagedChannelProvider.provider(ManagedChannelProvider.java:45)
at repackaged.io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:39)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:353)
at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:107)
at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:85)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:237)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:231)
at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:201)
at com.google.cloud.bigtable.data.v2.stub.EnhancedBigtableStub.create(EnhancedBigtableStub.java:175)
at com.google.cloud.bigtable.data.v2.BigtableDataClient.create(BigtableDataClient.java:165)
at com.groupon.crm.BigtableClient$.getDataClient(BigtableClient.scala:59)
... 44 elided
Following that, I added the dependency of in my build.sbt
file.
libraryDependencies += "io.grpc" % "grpc-netty" % "1.49.2"
Still, I am getting the same error.
Environment details Dataproc details -
"software_config": {
"image_version": "1.5-debian10",
"properties": {
"dataproc:dataproc.logging.stackdriver.job.driver.enable": "true",
"dataproc:dataproc.logging.stackdriver.enable": "true",
"dataproc:jobs.file-backed-output.enable": "true",
"dataproc:dataproc.logging.stackdriver.job.yarn.container.enable": "true",
"capacity-scheduler:yarn.scheduler.capacity.resource-calculator" : "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator",
"hive:hive.server2.materializedviews.cache.at.startup": "false",
"spark:spark.jars":"XXXX"
},
"optional_components": ["ZEPPELIN","ANACONDA","JUPYTER"]
}
Spark Job details -
val sparkVersion = "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % sparkVersion % "provided"
libraryDependencies += "com.google.cloud" % "google-cloud-bigtable" % "2.23.1"
libraryDependencies += "com.google.auth" % "google-auth-library-oauth2-http" % "1.17.0"
libraryDependencies += "io.grpc" % "grpc-netty" % "1.49.2"
Finally, I solved the issue myself. I followed the following steps.
src/main/resources
, add META-INF
directory and inside that folder, add services
directory.src/main/resources/META-INF/services
directory add 2 files, namely, io.grpc.LoadBalancerProvider
and io.grpc.NameResolverProvider
.io.grpc.LoadBalancerProvider
file io.grpc.internal.PickFirstLoadBalancerProvider
.io.grpc.internal.NameResolverProvider
file io.grpc.internal.DnsNameResolverProvider
.build.sbt
as follows.libraryDependencies += "io.grpc" % "grpc-netty-shaded" % "1.55.1"
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.protobuf.**" -> "shade_proto.@1").inAll,
ShadeRule.rename("com.google.common.**" -> "shade_googlecommon.@1").inAll
)
assembly / assemblyMergeStrategy := {
case path if path.contains("META-INF/services") => MergeStrategy.concat
case PathList("META-INF", _*) => MergeStrategy.discard
case _ => MergeStrategy.first
}