I have an ini file that sets some environment-specific properties.
cmds.ini
[DEV]
spark_submit=spark3-submit
[PROD]
spark_submit=spark3-submit
I am parsing this file in shell script and creating spark-submit()
function that replaces the original spark-submit
source_cmds.sh
#!/bin/bash
env=${1^^}
eval $(
awk -v section="[$env]" '
$0 == section {found=1; next}
/^\[/{found=0}
found && /^[^#;]/ {
gsub(/^[ \t]+|[ \t]+$/, "")
print
}
' /path/to/cmds.ini |
sed 's/ *= */=/g'
)
spark-submit(){
$spark_submit "@"
}
Here's how I source and use this script in another wrapper script (wrapper.sh
):
wrapper.sh
#!/bin/bash
source /path/to/source_cmds.sh DEV
spark-submit /path/to/pyspark_script.py
I wanted to include additional environment-specific properties in cmds.ini
and updated it as follows:
[DEV]
spark_submit=spark3-submit
conf_args="--conf 'spark.driver.extraJavaOptions=-Djava.io.tmpdir=/tmp/path/' --conf 'spark.executor.extraJavaOptions=-Djava.io.tmpdir=/tmp/path'"
[PROD]
spark_submit=spark3-submit
conf_args=
I also modified source_cmds.sh
to pass the conf_args
to the spark-submit
function:
spark-submit(){
$spark_submit $conf_args "@"
}
Now, when I run wrapper.sh
, Spark shows the following warnings:
Warning: Ignoring non-Spark config property: 'spark.driver.extraJavaOptions
Warning: Ignoring non-Spark config property: 'spark.executor.extraJavaOptions
However, running the same properties directly via the spark-submit
command works without any issues:
spark-submit \
--conf 'spark.driver.extraJavaOptions=-Djava.io.tmpdir=/tmp/path/' \
--conf 'spark.executor.extraJavaOptions=-Djava.io.tmpdir=/tmp/path' \
/path/to/pyspark_script.py
Questions:
cmds.ini
but work fine when passed directly?conf_args
is defined in cmds.ini
?spark-submit
function implementation to properly handle such arguments?Seems like the issue lies in the way I store conf_args
. The outer double quotes make the shell pass it as a single string, and spark interprets it as a single argument.
Store the conf_args
as an array of values and unpack them as below. This lets the array elements be treated as individual arguments in spark-submit
cmds.ini
[DEV]
spark_submit=spark3-submit
conf_args=(--conf "spark.driver.extraJavaOptions=-Djava.io.tmpdir=/tmp/path/" --conf "spark.executor.extraJavaOptions=-Djava.io.tmpdir=/tmp/path")
[PROD]
spark_submit=spark3-submit
conf_args=()
source_cmds.sh
spark-submit(){
$spark_submit "${conf_args[@]}" "@"
}
With this, spark identifies each --conf
as individual arguments and works well.