[SOLVED] Running user-local jupyter installation with another user

Running user-local jupyter installation with another user

I've installed jupyter with the --user option and it works fine. I need to run spark and read from HDFS inside a notebook and running jupyter with my personal user creates a problem with file permissions in HDFS. Therefore I decided to run it with the hdfs user (our cluster is configured so that all the spark jobs should be run by this hdfs user) but then it cannot find the dependencies that are in the /home/myuser/.local folder of my personal user. Is there a way to tell jupyter to run as the current user (hdfs) but looking for binaries and dependencies in another user's home?

Also I'm using toree as a gateway, if this can open up more options.

Solution

Try the old Hadoop trick:

run the Spark session under your own user
but in the Spark environment, insert an export HADOOP_USER_NAME=hdfs command, so that the driver registers as hdfs when allocating the YARN executors.

(will not work with Kerberos of course; but then it's a matter of authenticating as hdfs against Kerberos...)