qubole

How do I upgrade a library in Qubole's Jupyter Notebook, using PySpark?


Is there a way to do it right from a cell in the notebook? similar to pip install ... --upgrade I didn't know how to do what's instructed on https://docs.qubole.com/en/latest/faqs/general-questions/install-custom-python-libraries.html#pre-installed-python-libraries The current Python version is 3.5.3, and Pandas 0.20.1. I need to upgrade Pandas, and Matplotlib


Solution

  • In Qubole are two ways to upgrade/install a package for the python environment. Currently there is no interface available inside notebook to install new packages.

    New and Recommended Way (via Package Mangement) : User can enable Package Management functionality for an account and add new packages to a cluster via UI. There are lot of advantages of using package management over cluster versions in terms of performance and usability. Refer to https://docs.qubole.com/en/latest/user-guide/package-management/index.html for further details.

    Old Way (via bootstrap) : User can configure a bootstrap which is basically a shell script executed on each node when the cluster starts and or upscales (more nodes are getting added to cluster). This can be configured via clusters UI and need a cluster start for every change. This is what is instructed in link you shared.