But it works when running C:...\cwd> python SimpleApp.py it is code from https://spark.apache.org/docs/latest/quick-start.html 'Self contained application'
placed setup.py and SimpleApp.py
setup.py code:
from setuptools import setup, find_packages
setup(
name='my-spark-project',
version='0.1',
packages=find_packages(),
install_requires=[
'pyspark==3.5.1'
# Add other dependencies here
],
)
SimpleApp.py code:
"""SimpleApp.py"""
from pyspark.sql import SparkSession
logFile = "C:\\apache-spark\\README.md" # Should be some file on your system
spark = SparkSession.builder.appName("SimpleApp").getOrCreate()
logData = spark.read.text(logFile).cache()
numAs = logData.filter(logData.value.contains('a')).count()
numBs = logData.filter(logData.value.contains('b')).count()
print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
spark.stop()
and I executed
> pip install .
> spark-submit --master local[*] SimpleApp.py
result is like: Python24/04/05 14:22:35 INFO ShutdownHookManager: Shutdown hook called 24/04/05 14:22:35 INFO ShutdownHookManager: Deleting directory C:\Users\hendr\AppData\Local\Temp\spark-e91e861f-3f9b-4e18-b064-44bee42a2fb0
I did exactly as it says in document
I'm not entirely certain, but perhaps you could try two diffirent approaches:
1. use findspark: import findspark findspark.init("C:\spark")
--> pip install findspark. https://pypi.org/project/findspark/
2. find the spark-submit path: path/to/your/spark-submit --master local[*] SimpleApp.py