amazon-web-servicesjupyter-notebookaws-glue

How do you add python packages in AWS Glue 3.0 Jupyter Notebook jobs?


I am trying to migrate a project over to AWS Glue and in order to do this I need to install a few new packages. Given the structure and the need to see the outputs, I want to use the Jupyter Notebook job rather then the Python Shell job. I need to install openpyxl on the job and apparently this is done not by running the !pip install openpyxl and then import openpyxl which isn't able to find the module, but rather by adding the following key-value pair as a new parameter under the Job details advanced properties section. When I try and add "--additional-python-modules":"openpyxl==3.1.2" in the Python Shell version, it allows to do it, but when I try and do the same thing under the Jupyter Notebook job, there is no option to add a new parameter.

How to I add new parameters to the Jupyter Notebook job in Glue? Is there something that I am missing here?


Solution

  • Turns out that I was trying to do this wrong. First, I need to run the below command:

    %additional_python_modules openpyxl==3.1.2
    

    Then I needed to stop the session:

    %stop_session
    

    Then if I run any command like this it would restart the session:

    print('Start session')
    

    And then when I try import the python package, it works:

    import openpyxl
    print("Installed Package")