terraformdatabrickstokenazure-databricks

Creating single use dummy token to Set Databricks Token ACL with Terraform


To set the Databricks Token ACL with Terraform, there needs to be a token available. I have a solution for it which creates a token by first deployment and has a validity of 1 day, but I am told that is not a proper/sustainable solution and I have to find an alternative solution for it.

Currenty I am using the following python script:

import requests
import json

DATABRICKS_INSTANCE = "https://XXXXXXXXXXXXXX.azuredatabricks.net" 
DATABRICKS_TOKEN = "dapiXXXXXXXXXXXXXXXXXXX"

def create_dummy_token():
    url = f"{DATABRICKS_INSTANCE}/api/2.0/token/create"
    headers = {
        "Authorization": f"Bearer {DATABRICKS_TOKEN}",
        "Content-Type": "application/json"
    }
    payload = {
        "comment": "Dummy token",
        "lifetime_seconds": 86400  # 1 day
    }

    response = requests.post(url, headers=headers, data=json.dumps(payload))
    if response.status_code == 200:
        token_info = response.json()
        with open("dummy_token.json", "w") as token_file:
            json.dump({"token_value": token_info["token_value"]}, token_file)
        print("Dummy token created successfully.")
    else:
        print(f"Failed to create token: {response.text}")

if __name__ == "__main__":
    create_dummy_token()

and this is how I am calling it in my main.tf:

// Create a dummy token using an external data source
provider "local" {}

resource "null_resource" "create_dummy_token" {
  triggers = {
    always_run = timestamp()
  }

  provisioner "local-exec" {
    command = "python3 databricks_dummy_token.py"
  }
}

data "external" "dummy_token" {
  depends_on = [null_resource.create_dummy_token]
  program    = ["bash", "-c", "cat dummy_token.json"]
}

output "dummy_token_result" {
  value = data.external.dummy_token.result
}

output "dummy_token_new" {
  value = data.external.dummy_token.result["token_value"]
}

Can someone please suggest me an alternative solution or atleast how can I make this one better/sustainable?


Solution

  • Can someone please suggest me an alternative solution or at least how can I make this one better/sustainable?

    Basically, the databricks pat host token does not last longer as they frequently expire once the lifetime has ended.

    Below are the few alternative sustainable approaches which you can try in your environment.

    Firstly, you can increase the lifetime_seconds to 5 days or more according to the token requirement in payload block code as shown below.

    payload = {
        "comment": "Dummy token",
        "lifetime_seconds": 259200
    }
    

    Token created successfully:

    enter image description here

    Once it is created, the lifetime of the token has been clearly visible in the databricks workspace during the given time period.

    enter image description here

    Apart from using null_resource in terraform, there is an alternative approach of storing the generated PAT token in a key vault as detailed below.

    Refer azurerm_key_vault_secret data source here in GitHub.

    I have copied and stored the PAT token value in a key vault secret using Portal and retrieved it with the below terraform code. Once it is done, I have taken the value from the output block to use it later according to the requirement.

    While storing the token value as a secret in key vault, you can set the expiration date (lifetime) for the token as long as it is required in your environment.

    enter image description here

    provider "azurerm" {
     features{}
    }
    data "azurerm_resource_group" "example"{
      name = "Jahnavi"
    }
    data "azurerm_key_vault" "existing" {
      name    = "newvaultjj"
      resource_group_name        = data.azurerm_resource_group.example.name
    }
    data "azurerm_key_vault_secret" "example" {
      name         = "newsecret"
      key_vault_id = data.azurerm_key_vault.existing.id
    }
    
    output "secret_token_value" {
      value     = data.azurerm_key_vault_secret.example.value
      sensitive = true
    }
    

    enter image description here

    Also you can use a service principal instead of tokens to authenticate with the databricks workspace which is detailed in this blog.

    You can also create a class to handle PAT tokens which following databrick's 2.O API for tokens. Refer SO for the complete python code given by @BeGreen.