Can I use a dbt macro to modify the column names in AWS redshift?
Example changes to be made:
"checkedAt" --> checked_at
"helloWorld" --> hello_world
"testKitchenNew" --> test_kitchen_new
I tried to write a sample macro like this:
{% macro modify_col_names(column_name) %}
{% set column_name = column_name | replace('A', '_a') %}
{% set column_name = column_name | replace('K', '_k') %}
{% set column_name = column_name | replace('N', '_n') %}
{% set column_name = column_name | replace('W', '_w') %}
{{ column_name | lower }}
{% endmacro %}
Now I am trying to use it like this:
with raw_data as (
select
*
from
{{ source('src', 'TableName') }}
),
transformed_data as (
select
{{ modify_col_names(['checkedAt', 'helloWorld', 'testKitchenNew']) }}
from
raw_data
)
select
*
from
transformed_data
But this won't actually modify the column names and would throw an error like this:
Database Error in model xx (models/test.sql)
10:32:24 syntax error at or near "'checked_at'"
How can I actually modify the column names?
P.S: I would need to do it for all columns of the table so don't necessarily need to specify the column names manually.
To convert column names from CamelCase to snake_case you can use:
get_columns_in_relation
from dbt adapterconvert_column_names.sql
macro:
{% macro convert_column_names(relation) %}
{% set columns = adapter.get_columns_in_relation(relation) %}
{% set re = modules.re %}
{% set query %}
SELECT
{% for column in columns %}
"{{ column.column }}" AS {{ re.sub("(?<!^)([A-Z][a-z]|(?<=[a-z])[^a-z]|(?<=[A-Z])[0-9_])", "_\g<1>", column.column).lower() }}
{%- if not loop.last %},{% endif -%}
{% endfor %}
FROM {{ relation }}
{% endset %}
{{ return(query) }}
{% endmacro %}
Usage in model (you should either use source()
or ref()
:
{{ convert_column_names(source('src', 'TableName')) }}
Note, that some databases (for example, Snowflake) automatically convert column names to upper case. In that case this approach would not work.