I have a dbt project which gets the data from MySQL loads in RAW db in snowflake, transforms and loads in ANALYTICS db in snowflake. Now I have a requirement to create a table in a third database PROD in snowflake which aggregates data from ANALYTICS and loads in PROD.
.
If I change the DBT_DATABASE variable to PROD, I get an error since existing dbt models fail. How do I create another variable which points target database to PROD
There are several ways to configure the target database for dbt-snowflake, since Snowflake databases are just logical concepts and share a single connection.
Let's say your data in RAW
is configured as a source. You can add a .yml
file with the source info, specifying the database:
version: 2
sources:
- name: mysql_replica
database: RAW
tables:
- name: mysql_table
Now we can add a model that references that source and lands the transformed data into the ANALYTICS
database. We can do this by setting the target's default database to ANALYTICS
in our profiles.yml
:
my-snowflake-db:
target: dev
outputs:
dev:
type: snowflake
account: [account id]
# User/password auth
user: [username]
password: [password]
role: [user role]
database: ANALYTICS # or "{{ env_var('DBT_DATABASE') }}"
warehouse: [warehouse name]
schema: [dbt schema]
threads: [1 or more]
Now we can write our model, without config, to use the database we specified in our target:
-- my_model.sql
select * from {{ source("mysql_replica", "mysql_table") }}
You can also specify a "custom" database in the model config (docs). Unlike custom schemas, which have some complex behavior, models with custom databases just use the database you specify. So in another model, we can specify the database in a config block in the model file:
-- my_prod_model.sql
{{ config(database="PROD") }}
select * from {{ ref('my_model') }}
If you prefer (in recent versions of dbt), you can use the config:
key in a properties file instead:
# my_prod_model.yml
version: 2
models:
- name: my_prod_model
config:
database: PROD
Or you can configure a custom database for a single model, or directory of models, in your dbt_project.yml
file:
# dbt_project.yml
profile: my-snowflake-db
# other stuff up here
...
models:
# materialize models in the prod directory as tables in PROD database
prod:
+database: PROD
+materialized: table
FINALLY, you can use "dev" schemas in your databases to separate dev from prod, but if you don't want any dev runs of dbt to write to your PROD database, you can use a little jinja in your .yml
files:
# dbt_project.yml
profile: my-snowflake-db
# other stuff up here
...
models:
# materialize models in the prod directory as tables in PROD database
# if target is prod; otherwise, write to the DEV database
prod:
+database: "{{ 'PROD' if target.name == 'prod' else 'DEV' }}"
+materialized: table