I've installed jupyter with the --user option and it works fine. I need to run spark and read from HDFS inside a notebook and running jupyter with my personal user creates a problem with file permissions in HDFS. Therefore I decided to run it with the hdfs user (our cluster is configured so that all the spark jobs should be run by this hdfs user) but then it cannot find the dependencies that are in the /home/myuser/.local
folder of my personal user. Is there a way to tell jupyter to run as the current user (hdfs) but looking for binaries and dependencies in another user's home?
Also I'm using toree as a gateway, if this can open up more options.
Try the old Hadoop trick:
export HADOOP_USER_NAME=hdfs
command, so that the driver registers as hdfs
when allocating the YARN executors.(will not work with Kerberos of course; but then it's a matter of authenticating as hdfs
against Kerberos...)