I have been trying to define an external table to be used on dbt with the package dbt_external_tables. I did not get an error when I tried to execute the command
dbt run-operation stage_external_sources --args "select: external_s3_stage.tbl_s3_stage"
I just got the message No external sources selected
with each run. I suspect my issue is related to the location of my yml file.
Is there another way to define which file is to be executed with the parameter of the external table or is there any other configuration that I should add to my dbt_project.yml?
packages.yml
packages:
- package: dbt-labs/dbt_external_tables
version: 0.8.7
models/staging/ext_stage_source.yml
version: 2
sources:
- name: external_s3_stage
database: test_conn
schema: public
loader: S3
tables:
- name: tbl_s3_stage
ext_full_refresh: true
description: This is the first external table created as stage from dbt.
external:
location: "s3://bucket/data/"
file_format: >
(TYPE = CSV FIELD_OPTIONALLY_ENCLOSED_BY = '"' SKIP_HEADER = 1 TRIM_SPACE = TRUE ESCAPE_UNENCLOSED_FIELD = NONE FIELD_DELIMITER = ',')
columns:
- name: id
data_type: varchar
- name: name
data_type: varchar
My log message looks like this:
06:40:18 Running with dbt=1.6.3
06:40:18 Registered adapter: snowflake=1.6.3
06:40:18 Found 2 models, 4 tests, 1 source, 0 exposures, 0 metrics, 419 macros, 0 groups, 0 semantic models
06:40:18 No external sources selected
I also wanted to mention that I am able to execute dbt debug
and dbt run
with no issues. I am using dbt-core for this task.
Solution
Now it works. I just reviewed on the already answered questions at the Slack group of dbt, and I saw that my yml file required an extra indented.
Also, based on the official docs, I got that we need to have an Stage table already created and working, it means integration and stage. To create them please follow the official docs.
And finally to execute it I used the next line
dbt run-operation stage_external_sources
version: 2
sources:
- name: external_s3
database: T_CONNECTION
schema: T_SCHEMA
loader: S3
tables:
- name: my_s3_stage01
ext_full_refresh: true
description: This is the first external table created as stage from dbt.
columns:
- name: id
data_type: varchar
- name: email
data_type: varchar
external:
location: "@MY_S3_STAGE1"
file_format: '( type = csv skip_header = 1 )'
Results
PD
In case you execute a full_refresh and it is not refreshing the data and just SKIPPING, you can add the next var to your dbt command
dbt run-operation stage_external_sources --vars "ext_full_refresh: true"