amazon-s3pysparkboto3apache-nifikylo

nifi pyspark - "no module named boto3"


I'm trying to run a pyspark job I created that downloads and uploads data from s3 using the boto3 library. While the job runs fine in pycharm, when I try to run it in nifi using this template https://github.com/Teradata/kylo/blob/master/samples/templates/nifi-1.0/template-starter-pyspark.xml

The ExecutePySpark errors with "No module named boto3".

I made sure it was installed on my conda environment that is active.

Any ideas, im sure im missing something obvious.

Here is a picture of the nifi spark processor.

enter image description here

Thanks, tim


Solution

  • The Python environment where PySpark should run on is configured via the PYSPARK_PYTHON variable.