I'm trying to build/schedule a dbt project (sourced in Azure DevOps) in Databricks Workflows. However, whenever I run dbt there, I get the following error message:
CalledProcessError: Command 'b'\nmkdir -p "/tmp/tmp-dbt-run-1124228490001263"\nunexpected_errors="$(cp -a -u "/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/." "/tmp/tmp-dbt-run-1124228490001263" 2> >(grep -v \'Operation not supported\'))"\nif [[ -n "$unexpected_errors" ]]; then\n >&2 echo -e "Unexpected error(s) encountered while copying:\n$unexpected_errors"\n exit 1\nfi\n returned non-zero exit status 1.
Unexpected error(s) encountered while copying:
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/3d_drawing/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/algorithms/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/basic/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/drawing/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/graph/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/subclass/__pycache__': No such file or directory
I gather the issue arises the moment the repo files are being copied, but I don't know how to solve it. Any ideas?
These are the task settings:
resources:
jobs:
otd:
name: otd
email_notifications:
on_failure:
- mauricio.schwartsman@xxxxxxxx.com
no_alert_for_skipped_runs: true
notification_settings:
no_alert_for_skipped_runs: true
no_alert_for_canceled_runs: true
tasks:
- task_key: otd_dbt
dbt_task:
project_directory: ""
commands:
- dbt deps
- dbt build -s +otd_total
schema: gold
warehouse_id: xxxxxxxxxxx
catalog: logistics_prd
source: GIT
job_cluster_key: dbt_CLI
libraries:
- pypi:
package: dbt-databricks>=1.0.0,<2.0.0
job_clusters:
- job_cluster_key: dbt_CLI
new_cluster:
cluster_name: ""
spark_version: 15.4.x-scala2.12
spark_conf:
spark.master: local[*, 4]
spark.databricks.cluster.profile: singleNode
azure_attributes:
first_on_demand: 1
availability: ON_DEMAND_AZURE
spot_bid_max_price: -1
node_type_id: Standard_D4ds_v5
custom_tags:
ResourceClass: SingleNode
spark_env_vars:
PYSPARK_PYTHON: /databricks/python3/bin/python3
enable_elastic_disk: true
data_security_mode: SINGLE_USER
runtime_engine: PHOTON
num_workers: 0
git_source:
git_url: https://dev.azure.com/copa-energia/Logistics/_git/dbt_logistica
git_provider: azureDevOpsServices
git_branch: main
queue:
enabled: true
Please feel free to ask me for more details.
As it turns out, the solution was simpler than I expected.
Since those files are not necessary, I could simply remrove them from the repo and add them to .gitignore:
venv/
__pycache__/