terraformdatabricksazure-databricksterraform-provider-databricksdatabricks-workflows

How to create Azure Databricks jobs of type python wheel by terraform


I am using terraform to implement a databricks job in Azure. I have a python wheel that I need to execute in this job. Following terraform Azure databricks documentation at this link I know how to implement a databricks notebook job. However, what I need is a databricks job of type "python wheel". All the examples provided in the mentioned link are to create databricks jobs of either type "notebook_task", or "spark_jar_task" or "pipeline_task". None of them is what I exactly need. If you look into databricks workspace however, you can see there is a specific job of type "python wheel". Below you can see this in the workspace:

enter image description here

Just to elaborate more, according to documentation I have already created a job. Following is my main.tf file:

resource "databricks_notebook" "this" {
  path     = "/Users/myusername/${var.notebook_subdirectory}/${var.notebook_filename}"
  language = var.notebook_language
  source   = "./${var.notebook_filename}"
}

resource "databricks_job" "sample-tf-job" {
  name = var.job_name
  existing_cluster_id = "0342-285291-x0vbdshv"  ## databricks_cluster.this.cluster_id
  notebook_task {
    notebook_path = databricks_notebook.this.path 
  }
} 

As I said, this job is of type "Notebook" which is also in screen shot. The job I need is of type "Python wheel".

I am pretty sure terraform has already provided the capability to create "Python wheel" jobs as by looking at the source code in terraform provider for databricks I can see at currently line 49 python wheel task is defined. However, it is not clear to me how to call it in my code. Below is the source code I am referring to:

// PythonWheelTask contains the information for python wheel jobs
type PythonWheelTask struct {
    EntryPoint      string            `json:"entry_point,omitempty"`
    PackageName     string            `json:"package_name,omitempty"`
    Parameters      []string          `json:"parameters,omitempty"`
    NamedParameters map[string]string `json:"named_parameters,omitempty"`
}

Solution

  • Instead of notebook_task you just need to use the python_wheel_task configuration block, as described in the provider documentation. Something like this:

    resource "databricks_job" "sample-tf-job" {
      name = var.job_name
    
      task {
        task_key = "a"
        existing_cluster_id = "0342-285291-x0vbdshv"  ## databricks_cluster.this.cluster_id
        python_wheel_task {
          package_name = "my_package"
          entry_point = "entry_point"
        }
        library {
          whl = "dbfs:/FileStore/baz.whl"
        } 
      }
    } 
    

    P.S. It's better not to use interactive clusters as it's more expensive