pythondbt

How do I run DBT models from a Python script or program?


I have a DBT project, and a python script will be grabbing data from the postgresql to produce output.

However, part of the python script will need to make the DBT run. I haven't found the library that will let me cause a DBT run from an external script, but I'm pretty sure it exists. How do I do this?

ETA: The correct answer may be to download the DBT CLI and then use python system calls to use that.... I was hoping for a library, but I'll take what I can get.


Solution

  • Update: v1.5 has arrived!

    With v1.5 of dbt, we get a stable and officially supported Python API for invoking dbt operations; this API has functional parity with the CLI.

    From the docs:

    from dbt.cli.main import dbtRunner, dbtRunnerResult
    
    # initialize
    dbt = dbtRunner()
    
    # create CLI args as a list of strings
    cli_args = ["run", "--select", "tag:my_tag"]
    
    # run the command
    res: dbtRunnerResult = dbt.invoke(cli_args)
    
    # inspect the results
    for r in res.result:
        print(f"{r.node.name}: {r.status}")
    

    There are some caveats about the stability of artifacts returned by dbt.invoke; read the docs for more details.

    Original Answer

    (As of Jan 2023) There is not a public Python API for dbt, yet. It is expected in v1.5, which should be out in a couple months.

    Right now, your safest option is to use the CLI. If you don't want to use subprocess, the CLI uses Click now, and Click provides a runner that you can use to invoke Click commands. It's usually used for testing, but I think it would work for your use case, too. The CLI command is here. That would look something like:

    from click.testing import CliRunner
    from dbt.cli.main import run
    
    dbt_runner = CliRunner()
    dbt_runner.invoke(run, args="-s my_model")
    

    You could also invoke dbt the way they do in the test suite, using run_dbt.