[SOLVED] Why does my complete custom docker image fail when deployed to a general-purpose compute in Databricks?

TL;DR

I need help when deploying my custom docker image onto a Databricks cluster, knowing if I need to match the exact folder structure for /databricks, or something else, to avoid my error: java.lang.SecurityException: SHA1 digest error for META-INF/services/java.sql.Driver.

The last databricks namespace call in my stack trace was to com.databricks.backend.daemon.driver.DriverDaemon$.preloadJdbcDrivers(DriverDaemon.scala:943). The Dockerfile is quoted at the end.

Long Version

I've been spending time trying to create a docker image for developing with PySpark/Delta locally across our team's Macs and Windows machines. This works. I've followed the exact specification of Runtime 16.3 on Databricks.

My next step was to try to have that exact same custom image be the environment our code runs in Databricks as well, by adding it to a general-purpose compute cluster's configuration, under the Advanced options > Docker tab. Starting this cluster, I get following error:

INFO DriverDaemon$: Skipping class preloading because it is not enabled in conf
INFO log: Logging initialized @21122ms to shaded.v9_4.org.eclipse.jetty.util.log.Slf4jLog
INFO DriverDaemon$: Sent a notification to chauffeur about startup exception, took 6031
ERROR DriverDaemon$: XXX Fatal uncaught exception. Terminating driver.
java.lang.SecurityException: SHA1 digest error for META-INF/services/java.sql.Driver

when the DriverDaemon is invoked through the command:

/opt/java8/bin/java [LOTS OF OPTIONS] \
-cp /databricks/hadoop-safety-jars/*:/databricks/spark/dbconf/jets3t/:/databricks/spark/dbconf/log4j/driver:/databricks/hive/conf:/databricks/spark/dbconf/hadoop:/databricks/jars/* \
com.databricks.backend.daemon.driver.DriverDaemon

Of course, my image does not install Spark nor Hadoop in the /databricks folder. I thought that since Databricks' use of my image is discarding whatever CMD or ENTRYPOINT I pass it in the Dockerfile, it will link my installed Spark regardless of its location as long as I make it accessible to PATH, or it will inject its own Spark in that location. I did not find docs on what happens at cluster start up, and if my custom image needs to follow this draconic folder structure, why do they even mention it as an alternative at The Docker Databricks Docs?

I've tried accessing the /databricks folder from within a cluster with no Docker on it, just to see what JARs need that SHA check, and I can list the JARs with a reference to META-INF/services/java.sql.Driver:

/databricks/jars/----ws_3_5--mvn--hadoop3--org.postgresql--postgresql--org.postgresql__postgresql__42.6.0.jar
/databricks/jars/----ws_3_5--third_party--mssql-jdbc--mssql-jdbc--789028999--com.microsoft.sqlserver__mssql-jdbc__11.2.3.jre8.jar
/databricks/jars/----ws_3_5--third_party--mssql--mssql-hive-2.3__hadoop-3.2_2.12--1153968230--com.microsoft.sqlserver__mssql-jdbc__11.2.2.jre8.jar
/databricks/jars/----ws_3_5--mvn--hadoop3--org.apache.derby--derby--org.apache.derby__derby__10.14.2.0.jar
/databricks/jars/----ws_3_5--mvn--hadoop3--org.apache.hive--hive-jdbc--org.apache.hive__hive-jdbc__2.3.9.jar
/databricks/jars/----ws_3_5--third_party--snowflake-jdbc--net.snowflake__snowflake-jdbc__shaded---414110472--net.snowflake__snowflake-jdbc__3.16.1.jar
/databricks/jars/----ws_3_5--mvn--hadoop3--org.xerial--sqlite-jdbc--org.xerial__sqlite-jdbc__3.42.0.0.jar
/databricks/jars/----ws_3_5--third_party--mariadb-java-client--org.mariadb.jdbc__mariadb-java-client__2.7.9.jar
/databricks/jars/----ws_3_5--third_party--bigquery-jdbc--bigquery-driver-shaded---846918551--GoogleBigQueryJDBC42.jar
/databricks/jars/----ws_3_5--third_party--spark-jdbc--databricks-jdbc-driver-shaded---1191706110--DatabricksJDBC.jar
/databricks/jars/spark-excel_2.12-3.5.0_0.20.3.jar

My Dockerfile is this:

FROM ubuntu:noble-20241015

ARG DEBIAN_FRONTEND=noninteractive

# Install Python 3.12.3 and utilities like pip and venv, 
# some of which require https connections and hence, certificates;
# Will symlink python into local user bin for convenience (as it will be in the PATH);
# Then deletes the apt cache at /var/cache/apt/archives to reduce image size;
# It also gets rid of the apt lists at /var/lib/apt/lists/* to reduce image size.
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        python3.12 python3.12-dev \
        ca-certificates bash iproute2 coreutils procps sudo curl \
        build-essential pkg-config cmake \
        # dbus-python==1.3.2 dependency:
        dbus libdbus-glib-1-dev \
        # psycopg2==2.9.3 dependency:
        libpq-dev \
        # PyGObject==3.48.2 dependency, when installing pycairo==1.28.0:
        libcairo2-dev gobject-introspection libgirepository1.0-dev && \
    ln -s /usr/bin/python3.12 /usr/local/bin/python && \
    ln -s /usr/bin/python3.12 /usr/local/bin/python3 && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# https://cdn.azul.com/zulu/bin/zulu8.33.0.1-ca-jdk8.0.192-linux_x64.tar.gz
# https://cdn.azul.com/zulu-embedded/bin/zulu8.33.0.135-jdk1.8.0_192-linux_aarch64.tar.gz
# https://cdn.azul.com/zulu/bin/zulu17.54.21-ca-jdk17.0.13-linux_x64.tar.gz
# https://cdn.azul.com/zulu/bin/zulu17.54.21-ca-jdk17.0.13-linux_aarch64.tar.gz
# Multi-architecture build for Java 8 and 17;
ARG TARGETARCH
RUN set -eux; \
    case "${TARGETARCH}" in \
    amd64)  \
        ARCH=x64; \
        Z8_ID="zulu8.33.0.1-ca-jdk8.0.192"; \
        EMBEDDED=""; \
        Z17_ID="zulu17.54.21-ca-jdk17.0.13"; \
        ;; \
    arm64) \
        ARCH=aarch64; \
        Z8_ID="zulu8.33.0.135-jdk1.8.0_192"; \
        EMBEDDED="-embedded"; \
        Z17_ID="zulu17.54.21-ca-jdk17.0.13"; \
        ;; \
    *) \
        echo "Unsupported arch ${TARGETARCH}"; exit 1;; \
    esac && \
    for keyval in "${Z8_ID}:java8" "${Z17_ID}:java17"; do \
        IFS=':'; set -- $keyval; IFS=' '; \
        BUILD_ID=$1; DEST=$2; \
        URL="https://cdn.azul.com/zulu${EMBEDDED}/bin/${BUILD_ID}-linux_${ARCH}.tar.gz"; \
        EMBEDDED=""; \
        curl -fsSL -o /tmp/jdk.tgz "$URL"; \
        mkdir -p /opt/${DEST}; \
        tar -xzf /tmp/jdk.tgz --strip-components=1 -C /opt/${DEST}; \
        rm -f /tmp/jdk.tgz*; \
    done

# Register Java 8 (priority 200) and Java 17 (priority 100) as alternatives,
# including all matching slave links so every tool stays in sync.
RUN update-alternatives --install /usr/bin/java    java    /opt/java8/bin/java    200 \
    --slave   /usr/bin/javac    javac   /opt/java8/bin/javac  \
    --slave   /usr/bin/jar      jar     /opt/java8/bin/jar    \
    --slave   /usr/bin/javadoc  javadoc /opt/java8/bin/javadoc\
    --slave   /usr/bin/jcmd     jcmd    /opt/java8/bin/jcmd   \
    --slave   /usr/bin/jmap     jmap    /opt/java8/bin/jmap   \
    --slave   /usr/bin/jps      jps     /opt/java8/bin/jps    \
    --slave   /usr/bin/jstack   jstack  /opt/java8/bin/jstack \
    --slave   /usr/bin/keytool  keytool /opt/java8/bin/keytool && \
    update-alternatives --install /usr/bin/java    java    /opt/java17/bin/java    100 \
    --slave   /usr/bin/javac    javac   /opt/java17/bin/javac  \
    --slave   /usr/bin/jar      jar     /opt/java17/bin/jar    \
    --slave   /usr/bin/javadoc  javadoc /opt/java17/bin/javadoc\
    --slave   /usr/bin/jshell   jshell  /opt/java17/bin/jshell\
    --slave   /usr/bin/jcmd     jcmd    /opt/java17/bin/jcmd   \
    --slave   /usr/bin/jmap     jmap    /opt/java17/bin/jmap   \
    --slave   /usr/bin/jps      jps     /opt/java17/bin/jps    \
    --slave   /usr/bin/jstack   jstack  /opt/java17/bin/jstack \
    --slave   /usr/bin/keytool  keytool /opt/java17/bin/keytool

ENV JAVA_HOME=/opt/java8
ENV PATH="${JAVA_HOME}/bin:${PATH}"

# Makes sure that the pip we use is the Runtime 16.3 specified one (24.2);
# Bypasses the system pip PEP668 restriction by not installing python3-pip through apt-get,
# but rather here using the get-pip.py script;
RUN set -eux; \
    curl -fsSL https://bootstrap.pypa.io/get-pip.py -o /tmp/get-pip.py && \
    python3 /tmp/get-pip.py pip==24.2.0 --break-system-packages --ignore-installed && \
    rm -f /tmp/get-pip.py && \
    pip --version

# Install all the required packages for the Runtime 16.3;
COPY requirements.txt /usr/local/venvs/requirements.txt
RUN python3 -m pip install --break-system-packages --ignore-installed -r /usr/local/venvs/requirements.txt

# https://archive.apache.org/dist/spark/spark-${SPARK_VER}/spark-${SPARK_VER}-bin-without-hadoop.tgz
# https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-without-hadoop.tgz
# Download Spark 3.5.2 WITHOUT Hadoop, check it against MITM attacks,
# and install PySpark into the local Python environment;
# • downloads & verifies the .sha512 signature
# • unpacks under /opt/spark‑<ver>  ->  /opt/spark symlink
# • removes tar & checksum in‑layer to keep the image slim
ARG SPARK_VER=3.5.2
RUN set -eux; \
    cd /opt; \
    URL="https://archive.apache.org/dist/spark/spark-${SPARK_VER}/spark-${SPARK_VER}-bin-without-hadoop.tgz"; \
    curl -fsSL -O "$URL"; \
    curl -fsSL -O "${URL}.sha512"; \
    sha512sum --check "spark-${SPARK_VER}-bin-without-hadoop.tgz.sha512"; \
    tar -xzf "spark-${SPARK_VER}-bin-without-hadoop.tgz"; \
    mv spark-${SPARK_VER}-bin-without-hadoop spark-${SPARK_VER}; \
    ln -s spark-${SPARK_VER} spark; \
    rm -f /opt/*.tgz* && \
    # build PySpark wheel & install (matching JVM bits)
    cd /opt/spark/python; \
    python3 setup.py -q sdist && \
    python3 -m pip install --no-cache-dir --break-system-packages dist/pyspark-${SPARK_VER}.tar.gz && \
    rm -rf build dist

ENV SPARK_HOME=/opt/spark
ENV SPARK_JAVA_HOME=/opt/java17

# Multi-architecture build for Hadoop 3.3.6,
# and setting environment variables so we can use Hadoop:
# • downloads & verifies the .sha512 signature
# • unpacks under /opt/hadoop‑<ver>  ->  /opt/hadoop symlink
# • removes tar & checksum in‑layer to keep the image slim
ARG HADOOP_VER=3.3.6
RUN set -eux && \
    cd /opt && \
    case "$TARGETARCH" in \
    amd64) \
        FILE="hadoop-${HADOOP_VER}.tar.gz" ;; \
    arm64) \
        FILE="hadoop-${HADOOP_VER}-aarch64.tar.gz" ;; \
    *) \
        echo "Unsupported arch $TARGETARCH" >&2; exit 1 ;; \
    esac && \
    URL="https://downloads.apache.org/hadoop/common/hadoop-${HADOOP_VER}/${FILE}" && \
    curl -fsSL --http1.1 -O "$URL" && \
    curl -fsSL "${URL}.sha512" \
        | sed "s|hadoop-${HADOOP_VER}-RC1\.tar\.gz|${FILE}|" \
        > "${FILE}.sha512" && \
    sha512sum --check "$FILE.sha512" && \
    tar -xzf "$FILE" && \
    ln -s "hadoop-${HADOOP_VER}" hadoop && \
    rm -f /opt/*.tgz*

ENV HADOOP_HOME=/opt/hadoop
ENV PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH"

# We run 'hadoop classpath' in a shell to capture its output,
# and set it in the shell script to be read at Spark runtime;
# Alongside the classpath, we specify the use of optional jars for Azure connection;
# This also prepares the Spark environment to use Delta Lake:
RUN set -eux && \
    SPARK_DIST_CP="$(hadoop classpath)" && \
    echo "export SPARK_DIST_CLASSPATH=\"${SPARK_DIST_CP}:${HADOOP_HOME}/share/hadoop/tools/lib/*\"" \
        >> /opt/spark/conf/spark-env.sh && \
    echo 'spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension' \
        >> /opt/spark/conf/spark-defaults.conf && \
    echo 'spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog' \
        >> /opt/spark/conf/spark-defaults.conf
# We also set the env variable to read the Spark config we just set:
ENV SPARK_CONF_DIR=/opt/spark/conf

# Download Delta Lake JARs compatible with Spark 3.5.2:
RUN set -eux && \
    cd /opt/spark/jars && \
    curl -fsSL -O https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.3.0/delta-spark_2.12-3.3.0.jar && \
    curl -fsSL -O https://repo1.maven.org/maven2/io/delta/delta-storage/3.3.0/delta-storage-3.3.0.jar

Appendix

Custom Docker Image with Databricks jobs API - Doesn't work for a general-purpose compute like mine;
Databricks runtime from docker hub image vs original Databricks runtime - Mentions the possibility of using a custom container, but then uses the same documentation I've been following to the T;
How to use a custom docker image with Azure Databricks - Only mentions ML Clusters;

Hopefully this question doesn't qualify for the following caveat in the guide to ask good questions:

Questions on professional server, networking, or related infrastructure administration are off-topic for Stack Overflow unless they directly involve programming or programming tools.