pythonapache-sparkcluster-mode

Can I add arguments to python code when I submit spark job?


I'm trying to use spark-submit to execute my python code in spark cluster.

Generally we run spark-submit with python code like below.

# Run a Python application on a cluster
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  my_python_code.py \
  1000

But I wanna run my_python_code.pyby passing several arguments Is there smart way to pass arguments?


Solution

  • Yes: Put this in a file called args.py

    #import sys
    print sys.argv
    

    If you run

    spark-submit args.py a b c d e 
    

    You will see:

    ['/spark/args.py', 'a', 'b', 'c', 'd', 'e']