pythonsqldbt

Extracting ref and source tables from a dbt model


I have a model in dbt which uses different sources and ref tables. So my question is... Is it possible to create a macro that will extract all the ref and source tables from the model? Let me give you and example

I have this model:

    select
*

from        {{ ref('x') }}   a
left join   {{ source('landing', 'y') }} ma on a.... = ma....
left join   {{ source('landing', 'z') }} b on a.... = b....

So, once the macro is run for that model, I would like to get the names or the ref and source tables. Is that possible? I would like to do this in dbt, but let me know if it would be better to use python instead.

Thanks!


Solution

  • -- my_macro.sql
    
    {% macro get_dependencies(model_name) %}
        {% set models = graph.nodes.values() %}
    
        {% set model = (models | selectattr('name', 'equalto', model_name) | list).pop() %}
    
        {% do log("sources: ", info=true) %}
        {% set sources = model['sources'] %}
        {% for source in sources %}
        {% do log(source[0] ~ '.' ~ source[1], info=true) %}
        {% endfor %}
    
    
        {% do log("refs: ", info=true) %}
        {% set refs = model['refs'] %}
        {% for ref in refs %}
        {% do log(ref[0], info=true) %}
        {% endfor %}
    
    {% endmacro %}
    

    Once you add this macro, give it a try with this…

    dbt run-operation get_dependencies --args '{model_name: my_model}'

    and it will spit out…

    sources:
    landing.y
    landing.z
    refs:
    x
    

    The way I currently have it written, it will write out any repeating sources or models. e.g. if you have {{ ref('x') }} in multiple locations, then it will write it out as many times.

    refs:
    x
    x
    x
    

    I used the below documentaiton about the graph variable to piece this together.

    https://docs.getdbt.com/reference/dbt-jinja-functions/graph

    EDIT:

    Here's a way to get a unique list of sources and refs. It's not the prettiest but it works. (There has to be better way to get a unique list!)

    {% macro get_dependencies(model_name) %}
        {% set models = graph.nodes.values() %}
    
        {% set model = (models | selectattr('name', 'equalto', model_name) | list).pop() %}
        {% set sources = model['sources'] %}
        {% set refs = model['refs'] %}
    
    
        {% do log("sources: ", info=true) %}
        {% set unique_sources = [] %}    
        {% for source in sources if source not in unique_sources %}
            {% do unique_sources.append(source) %}
            {% do log(source[0] ~ '.' ~ source[1], info=true) %}
        {% endfor %}
    
        {% do log("refs: ", info=true) %}
        {% set unique_refs = [] %}    
        {% for ref in refs if reference not in unique_refs %}
            {% do unique_refs.append(ref) %}
            {% do log(ref[0], info=true) %}
        {% endfor %}
    
    {% endmacro %}