I'm new with EMR-serverless and I want to know how to pass, in a spark application, jar and packages as for example:
spark-submit --deploy-mode client --jars /usr/lib/hudi/hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar,/usr/lib/hudi/hudi-utilities-bundle_2.12-0.11.1-amzn-0.jar ...
I want to set when I submit a job but I cannot find a way about how to do it.
Can someone help me with this, please?
When submitting a job to EMR Serverless in the console and you want to provide additional options to spark-submit
, you can use the "Spark properties" section. Instead of --jars
, you can use the spark.jars
key and set the value appropriately.
Your Spark application will be a Python script or JAR file on S3 provided as the "Script location" aka entrypoint.
Also note that Hudi is available on the EMR Serverless image and there's some documentation on using Hudi with EMR Serverless.