I have bash script that looks something like this,
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17
python -m pip install --upgrade pip
pip install twine keyring artifacts-keyring
pip install -r requirements.txt
I am basically just trying to install a SQL Server and then running some Python commands.
I am trying to run this on a Databricks cluster.
When I do,
%sh
bash <path-to-bash-script.sh>
Or
%sh
sh <path-to-bash-script.sh>
I get an error when trying to download the driver,
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 983 100 983 0 0 12287 0 --:--:-- --:--:-- --:--:-- 12287
Warning: apt-key output should not be parsed (stdout is not a terminal)
gpg: invalid option "-
"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 79 100 79 0 0 975 0 --:--:-- --:--:-- --:--:-- 975
E: Invalid operation update
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package msodbcsql17
Note: I am creating this file locally as part of a project and then I have a CICD pipeline that copies the file into a Databricks workspace.
However, when I take the commands in this file and just run it within a cell using %sh
, it runs without an issue.
What exactly is the problem here?
The reason behind this is not entirely clear, however, my best guesses are as follows,
How I was able to resolve this issue is by simple creating the file inside of the Databricks workspace by using dbutils
. For example,
dbutils.fs.put("dbfs:/scripts/install_dependencies.sh","""
#!/bin/bash
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
apt-get update
ACCEPT_EULA=Y apt-get -q -y install msodbcsql17""", True)
This runs without an issue and it seems to be the recommended way to create any init scripts that you want to run on your clusters.
The downside is that you can't exactly version control these scripts and will require them to be overwritten each time a change is required.