Trying to use google-cloud-dataproc-serveless
with spark.jars.repositories
option
gcloud beta dataproc batches submit pyspark sample.py --project=$GCP_PROJECT --region=$MY_REGION --properties \
spark.jars.repositories='https://my.repo.com:443/artifactory/my-maven-prod-group',\
spark.jars.packages='com.spark.mypackage:my-module-jar',spark.dataproc.driverEnv.javax.net.ssl.trustStore=.,\
spark.driver.extraJavaOptions='-Djavax.net.ssl.trustStore=. -Djavax.net.debug=true' \
--files=my-ca-bundle.crt
giving this exception
javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValidatorException
Tried to set this property javax.net.ssl.trustStore
using spark.dataproc.driverEnv
/spark.driver.extraJavaOptions
, but its not working.
Is it possible to fix this issue by setting the right config properties and values, or Custom Image is the ONLY solution, with pre installed certificates?
You need to have a Java trust store with your cert imported. Then submit the batch with
--files=my-trust-store.jks \
--properties spark.driver.extraJavaOptions='-Djavax.net.ssl.trustStore=./my-trust-store.jks',spark.executor.extraJavaOptions='-Djavax.net.ssl.trustStore=./my-trust-store.jks'