sqlgoogle-bigqueryjinja2dbt

How do I run SQL model in dbt multiple times by looping through variables?


I have a model in dbt (test_model) that accepts a geography variable (zip, state, region) in the configuration. I would like to run the model three times by looping through the variables, each time running it with a different variable.

Here's the catch: I have a macro shown below that appends the variable to the end of the output table name (i.e., running test_model with zip as the variable outputs a table called test_model_zip). This is accomplished by adding {{ config(alias=var('geo')) }} at the top of the model.

Whether I define the variable within dbt_project.yml, the model itself, or on the CLI, I've been unable to find a way to loop through these variables, each time passing the new variable to the configuration, and successfully create three tables. Do any of you have an idea how to accomplish this? FWIW, I'm using BigQuery SQL.

The macro:

{% macro generate_alias_name(custom_alias_name=none, node=none) -%}

    {%- if custom_alias_name is none -%}

        {{ node.name }}

    {%- else -%}

        {% set node_name = node.name ~ '_' ~ custom_alias_name %}
        {{ node_name | trim }}

    {%- endif -%}

{%- endmacro %}

The model, run by entering dbt run --select test_model.sql --vars '{"geo": "zip"}' in the CLI:

{{ config(materialized='table', alias=var('geo')) }}

with query as (select 1 as id)

select * from query

The current output: a single table called test_model_zip.

The desired output: three tables called test_model_zip, test_model_state, and test_model_region.


Solution

  • I would flip this on its head.

    dbt doesn't really have a concept for parameterized models, so if you materialize a single model in multiple places, you'll lose lineage (the DAG relationship) and docs/etc. will get all confused.

    Much better to create multiple model files that simply call a macro with a different parameter, like this:

    geo_model_macro.sql

    {% macro geo_model_macro(grain) %}
    select
        {{ grain }},
        count(*)
    from {{ ref('my_upstream_table') }}
    group by 1
    {% endmacro %}
    

    my_model_zip.sql

    {{ geo_model_macro('zip') }}
    

    my_model_state.sql

    {{ geo_model_macro('state') }}
    

    my_model_region.sql

    {{ geo_model_macro('region') }}
    

    If I needed to do this hundreds of times (instead of 3), I would either:

    1. Create a script to generate all of these .sql files for me
    2. Create a new materialization that accepted a list of parameters, but this would be a super-advanced, here-be-dragons approach that is probably only appropriate when you've maxed out your other options.