I have a DBT project, and a python script will be grabbing data from the postgresql to produce output.
However, part of the python script will need to make the DBT run. I haven't found the library that will let me cause a DBT run from an external script, but I'm pretty sure it exists. How do I do this?
ETA: The correct answer may be to download the DBT CLI and then use python system calls to use that.... I was hoping for a library, but I'll take what I can get.
With v1.5 of dbt, we get a stable and officially supported Python API for invoking dbt operations; this API has functional parity with the CLI.
From the docs:
from dbt.cli.main import dbtRunner, dbtRunnerResult
# initialize
dbt = dbtRunner()
# create CLI args as a list of strings
cli_args = ["run", "--select", "tag:my_tag"]
# run the command
res: dbtRunnerResult = dbt.invoke(cli_args)
# inspect the results
for r in res.result:
print(f"{r.node.name}: {r.status}")
There are some caveats about the stability of artifacts returned by dbt.invoke
; read the docs for more details.
(As of Jan 2023) There is not a public Python API for dbt, yet. It is expected in v1.5, which should be out in a couple months.
Right now, your safest option is to use the CLI. If you don't want to use subprocess
, the CLI uses Click now, and Click provides a runner that you can use to invoke Click commands. It's usually used for testing, but I think it would work for your use case, too. The CLI command is here. That would look something like:
from click.testing import CliRunner
from dbt.cli.main import run
dbt_runner = CliRunner()
dbt_runner.invoke(run, args="-s my_model")
You could also invoke dbt the way they do in the test suite, using run_dbt
.