pythonamazon-s3importamazon-emr

Access data on EMR directory from EMR Studio: Workspaces (Notebooks)


I have some data saved on s3, which I want to import while running a python script on EMR.

To do it through a python code on EMR console: I just create the directories/file on my EMR like this /home/mysource/settings by copying the files from S3 to EMR and then the following code works as it should.

import sys
from mysource.settings import *

Now I want to do the same thing from EMR studio: Workspaces, but apparently, even after attaching the EMR cluster to a workspace notebook, I am not able to make the import work. It would be even better if I can import directly from s3, without creating this directory/file structure on EMR in first place.

I use an access key and a secret access key to transfer data between s3 and EMR (without tokens).


Solution

  • TLDR:

    More details: