I'm trying to run a pyspark job I created that downloads and uploads data from s3 using the boto3 library. While the job runs fine in pycharm, when I try to run it in nifi using this template https://github.com/Teradata/kylo/blob/master/samples/templates/nifi-1.0/template-starter-pyspark.xml
The ExecutePySpark errors with "No module named boto3".
I made sure it was installed on my conda environment that is active.
Any ideas, im sure im missing something obvious.
Here is a picture of the nifi spark processor.
Thanks, tim
The Python environment where PySpark
should run on is configured via the PYSPARK_PYTHON
variable.
conf
spark-env.sh
export PYSPARK_PYTHON=PATH_TO_YOUR_CONDA_ENV