databricksdbt

How can I stop dbt inserting an invalid character into a table creation script?


I'm using dbt on a Databricks platform. I'm creating a new table - something I've done dozens of times before without problems, but this one's making dbt choke. Here's my sql script:

{% set key_columns = ['transaction_originator','dlk_bnz_insert_timestamp'] %}

{{
    config(alias='bnz_std_sqlbi01_user_transaction_file',
    materialized='table',
    unique_key=key_columns,
    file_format='delta'
    )
}}
SELECT
   TRIM(CAST(transaction_originator AS STRING)) AS transaction_originator,
   CAST(ab_number AS BIGINT) AS ab_number,
   TRIM(CAST(alpha_name AS STRING)) AS alpha_name,
   TRIM(CAST(emcu AS STRING)) AS emcu,
   CASE WHEN LOWER(active_user) = 'true' THEN True ELSE False END AS active_user,
   TRIM(CAST(location AS STRING)) AS location,
   TRIM(CAST(status AS STRING)) AS status,
   'sqlbi01' AS dlk_src_name,
   _metadata.file_modification_time AS dlk_bnz_insert_timestamp,
   CURRENT_TIMESTAMP() AS dlk_bnz_std_insert_timestamp
FROM {{ source('bronze', 'bnz_raw_sqlbi01_user_transaction_file') }}
WHERE transaction_originator IS NOT NULL

There's a matching set of entries in the schema.yml that I won't post because I don't think they're relevant. When I run dbt to create & populate the table, it fails, the relevant bit of the log file saying this:

14:09:49    Database Error in model bnz_std_sqlbi01_user_transaction_file (models/bronze/bnz_std_sqlbi01_user_transaction_file.sql)
  
  [PARSE_SYNTAX_ERROR] Syntax error at or near ''. SQLSTATE: 42601 (line 16, pos 6)
  
  == SQL ==
  /* {"app": "dbt", "dbt_version": "1.9.4", "dbt_databricks_version": "1.10.1", "databricks_sql_connector_version": "4.0.3", "profile_name": "datalakehouse", "target_name": "databricks_cluster", "node_id": "model.datalakehouse.bnz_std_sqlbi01_user_transaction_file"} */
  
    
      
          create or replace table `dev`.`bronze`.`bnz_std_sqlbi01_user_transaction_file`
        
        using delta
        
        
        
        
        
        
        
        as
        
  ------^^^
  
  
  
  
  SELECT
     TRIM(CAST(transaction_originator AS STRING)) AS transaction_originator,
     CAST(ab_number AS BIGINT) AS ab_number,
     TRIM(CAST(alpha_name AS STRING)) AS alpha_name,
     TRIM(CAST(emcu AS STRING)) AS emcu,
     CASE WHEN LOWER(active_user) = 'true' THEN True ELSE False END AS active_user,
     TRIM(CAST(location AS STRING)) AS location,
     TRIM(CAST(status AS STRING)) AS status,
     'sqlbi01' AS dlk_src_name,
     _metadata.file_modification_time AS dlk_bnz_insert_timestamp,
     CURRENT_TIMESTAMP() AS dlk_bnz_std_insert_timestamp
  FROM `prod`.`bronze`.`bnz_raw_sqlbi01_user_transaction_file`
  WHERE transaction_originator IS NOT NULL
  
  compiled code at target/run/datalakehouse/models/bronze/bnz_std_sqlbi01_user_transaction_file.sql
14:09:49  

The specific character it's choking on, the one under the as, is a U+FEFF character that's being introduced from somewhere. I've checked and double checked my source files for unprintable characters - they're clean.

Where the heck is this character coming from, and how can I get rid of it?


Solution

  • OK, so I never figured out what was going wrong here, but abandoning the git patch I was working on and starting a new one from scratch got it to work.

    Just some random glitch, I guess.