python pyspark databricks azure-databricks delta-live-tables

ModuleNotFoundError: No module named 'dlt' error when running Delta Live Tables Python notebook

When attempting to create a Python notebook and follow the various examples for setting up databricks delta live tables, you will immediately be met with the following error if you attempt to run your notebook:

ModuleNotFoundError: No module named 'dlt'

A self-sufficient developer may then attempt to resolve this with a "magic command" to install said module: %pip install dlt

But alas, this dlt package has nothing to do with databricks delta live tables. Running your code will now raise the error:

AttributeError: module 'dlt' has no attribute 'table'

(or a similar error, depending on the first dlt class member you attempted to use)

What's going on? How do you run your Delta Live Tables pipeline setup code?

Solution

While you are expected to compose your delta live tables setup code in the databricks notebook environment, you are not meant to run it there. The only supported way to run your code is to head on over to the pipelines interface to run it.

End of Answer.

Although....

This is bad news for developers who wrote a lot of code and aren't even sure if it's syntactically valid (since the databricks IDE has only limited real-time feedback). You would now be stuck waiting for your pipeline to spin up resources, start, fail, then go through the stack trace to try and figure out where you went wrong. You're stuck with this workflow to work through logical errors, but you don't have to be stuck with it while working through syntactical errors.

Here is a workaround I came up with:

try:
  import dlt # When run in a pipeline, this package will exist (no way to import it here)
except ImportError:
  class dlt: # "Mock" the dlt class so that we can syntax check the rest of our python in the databricks notebook editor
    def table(comment, **options): # Mock the @dlt.table attribute so that it is seen as syntactically valid below
      def _(f):
        pass
      return _;

@dlt.table(comment = "Raw Widget Data")
def widgets_raw():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv").option("header", "true").option("sep", "|")
      .load("/mnt/LandingZone/EMRAW/widgets")
  )

The trick here is I am mocking out the dlt class to the bare minimum to pass syntax checks, so the rest of my code can be verified.

The annoying thing is that sql notebooks don't have this problem, when you run them, you get the pleasing message:

This Delta Live Tables query is syntactically valid, but you must create a pipeline in order to define and populate your table.

Unfortunately, I find sql notebooks limiting in other ways, so pick your poison.

Either way, hopefully it's clear that your code won't actually do anything until you run it in a pipeline. The notebook is just for setup, and it's nice to get as many syntax checks out of the way upfront before you have to start troubleshooting from the pipelines UI.