I have installed pyspark in a miniconda environment on Ubuntu through conda install pyspark
. So far everything works fine: I can run jobs through spark-submit
and I can inspect running jobs at localhost:4040
. But I can't locate start-history-server.sh
, which I need to look at jobs that have completed.
It is supposed to be in {spark}/sbin
, where {spark}
is the installation directory of spark. I'm not sure where that is supposed to be when spark is installed through conda, but I have searched through the entire miniconda directory and I can't seem to locate start-history-server.sh
. For what it's worth, this is for both python 3.7 and 2.7 environments.
My question is: is start-history-server.sh
included in a conda installation of pyspark?
If yes, where? If no, what's the recommended alternative way of evaluating spark jobs after the fact?
EDIT: I've filed a pull request to add the history server scripts to pyspark. The pull request has been merged, so this should tentatively show up in Spark 3.0.
As @pedvaljim points out in a comment, this is not conda-specific, the directory sbin
isn't included in pyspark at all.
The good news is that it's possible to just manually download this folder from github (i.e. not sure how to download just one directory, I just cloned all of spark) into your spark folder. If you're using mini- or anaconda, the spark folder is e.g. miniconda3/envs/{name_of_environment}/lib/python3.7/site-packages/pyspark
.