regex postgresql macros amazon-redshift dbt

change column names using a dbt macro

Can I use a dbt macro to modify the column names in AWS redshift?

Example changes to be made:

"checkedAt" --> checked_at
"helloWorld" --> hello_world
"testKitchenNew" --> test_kitchen_new

I tried to write a sample macro like this:

{% macro modify_col_names(column_name) %}
  {% set column_name = column_name | replace('A', '_a') %}
  {% set column_name = column_name | replace('K', '_k') %}
  {% set column_name = column_name | replace('N', '_n') %}
  {% set column_name = column_name | replace('W', '_w') %}
  {{ column_name | lower }}
{% endmacro %}

Now I am trying to use it like this:

with raw_data as (
    select
        *
    from
        {{ source('src', 'TableName') }}
),
transformed_data as (
    select
        {{ modify_col_names(['checkedAt', 'helloWorld', 'testKitchenNew']) }}
    from
        raw_data
)
select
    *
from
    transformed_data

But this won't actually modify the column names and would throw an error like this:

Database Error in model xx (models/test.sql)
10:32:24    syntax error at or near "'checked_at'"

How can I actually modify the column names?

P.S: I would need to do it for all columns of the table so don't necessarily need to specify the column names manually.

Solution

To convert column names from CamelCase to snake_case you can use:

get_columns_in_relation from dbt adapter
regex python module available in dbt

convert_column_names.sql macro:

{% macro convert_column_names(relation) %}

    {% set columns = adapter.get_columns_in_relation(relation) %}
    {% set re = modules.re %}

    {% set query  %}
    SELECT
    {% for column in columns %}
        "{{ column.column }}" AS {{ re.sub("(?<!^)([A-Z][a-z]|(?<=[a-z])[^a-z]|(?<=[A-Z])[0-9_])", "_\g<1>", column.column).lower() }}
    {%- if not loop.last %},{% endif -%}
    {% endfor %}
    FROM {{ relation }}
    {% endset %}

    {{ return(query) }}

{% endmacro %}

Usage in model (you should either use source() or ref():

{{ convert_column_names(source('src', 'TableName')) }}

Note, that some databases (for example, Snowflake) automatically convert column names to upper case. In that case this approach would not work.