azurepowershellazure-devopsazure-databricksazure-service-principal

Databricks Bearer Token Creation API Call from PowerShell Script Erroring out with 403 Error


Trying to create a CICD Pipeline using YAML method in Azure DevOps. The YAML file in turn uses a PowerShell script to generate a bearer token using the Azure Databricks API call to generate the bearer token. I am using a Service Principal to achieve this, but the step where the POST method is being called to the API is returning error as below:

Azure DevOps Output

Below is the PS script that I am trying to use:

param
(
    [parameter(Mandatory = $true)] [String] $databricksWorkspaceResourceId,
    [parameter(Mandatory = $true)] [String] $databricksWorkspaceUrl,
    [parameter(Mandatory = $true)] [String] $databricksOrgId,
    [parameter(Mandatory = $false)] [int] $tokenLifeTimeSeconds = 300
)

$azureDatabricksPrincipalId = '2ff814a6-3304-4ab8-85cb-cd0e6f879c1d'

$headers = @{}
$headers["Authorization"] = "Bearer $((az account get-access-token --resource $azureDatabricksPrincipalId | ConvertFrom-Json).accessToken)"
$headers["X-Databricks-Azure-SP-Management-Token"] = "$((az account get-access-token --resource https://management.core.windows.net/ | ConvertFrom-Json).accessToken)"
$headers["X-Databricks-Org-Id"] = $databricksOrgId
$headers["X-Databricks-Azure-Workspace-Resource-Id"] = $databricksWorkspaceResourceId

Write-Verbose $databricksWorkspaceResourceId
Write-Verbose $databricksWorkspaceUrl

$json = @{}
$json["lifetime_seconds"] = $tokenLifeTimeSeconds

$req = Invoke-WebRequest -Uri "https://$databricksWorkspaceUrl/api/2.0/token/create" -Body ($json | convertTo-Json) -ContentType "application/json" -Headers $headers -Method Post
$bearerToken = ($req.Content | ConvertFrom-Json).token_value

return $bearerToken

All the required parameters are being passed from the master YAML file.

The service principal that I am using is being granted contributor access to the required resource group and onto the Databricks as well. Also it has been given the API permission as well on to the AzureDatabricks.

Service Principal Permissions

Is it that the Service Principal is not being Granted Admin Consent for the API Permissions? Or is it because that the Service principal is not being given the "Owner" role assigned at the databricks level?

Could someone please help me figure out what the issue is?

Note: AAD Token, Management Access Token are being generated as desired by my PS script and being verified.


Solution

  • You can try to check with the following things to resolve the issue:

    1. When using the 'az account get-access-token' command to generate the AAD Access Token, ensure the associated user (or identity) has been added as the Azure Databricks workspace admin on the workspace Admin Settings page. For your case, you need to assign the workspace admin role to the Service Principal.

    2. Use the Service Principal to create an ARM service connection (Azure Resource Manager service connection). Also see "Connect to Microsoft Azure".

    3. Then use an Azure CLI task with the ARM service connection in the pipeline to run the related PowerShell script to generate Databricks token for the Service Principal.

      variables:
        resourceID: 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
        workspaceUrl: {workspace Url}
      
      steps:
      - task: AzureCLI@2
        displayName: 'Create Databricks PAT'
        inputs:
          azureSubscription: {ARM service connection name}
          scriptType: pscore
          scriptLocation: inlineScript
          inlineScript: |
            Write-Host "Generate AAD Access Token for Azure Databricks service."
            $AAD_Token = (az account get-access-token --resource $(resourceID) --query "accessToken" --output tsv)
            Write-Host $AAD_Token
      
            Write-Host "Generate Azure Databricks PAT."
            $DatabricksPAT_response = (curl --request POST "$(workspaceUrl)/api/2.0/token/create" --header "Authorization: Bearer $AAD_Token" --data '{\"lifetime_seconds\": 600, \"comment\": \"This is an example token.\"}')
            $DatabricksPAT = $DatabricksPAT_response.token_value
            Write-Host $DatabricksPAT
      

    Here is a similar case as reference.


    In addition, you also can try to use the following Bash script to generate the token.

    tenant_ID='xxxx'
    client_id='xxxx'
    client_secret='xxxx'
    workspaceUrl='xxxx'
    
    aad_token=$(curl -X POST -H "Content-Type: application/x-www-form-urlencoded" "https://login.microsoftonline.com/$tenant_ID/oauth2/v2.0/token" \
    -d "grant_type=client_credentials&client_id=$client_id&client_secret=$client_secret&scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default" | jq -r '.access_token')
    echo $aad_token
    
    Databricks_pat=$(curl --request POST "$workspaceUrl/api/2.0/token/create" -H "Authorization: Bearer $aad_token" \
    -d '{"lifetime_seconds": 600, "comment": "This is an example token."}' | jq -r '.token_value')
    echo $Databricks_pat
    

    EDIT:

    You can follow the steps below to assign the workspace admin role to the Service Principal:

    1. Open the workspace Admin Settings page.

      enter image description here

    2. Add the Service Principal into the workspace.

      enter image description here

    3. Add the Service Principal to the admins group in the workspace.

      enter image description here


    EDIT_2:

    If your Azure Databricks workspace has some network restrictions (e.g., firewall or proxy server) set, when you run the related API/CLI to in Azure Pipelines to access the workspace, you generally need to add the IP addresses (or IP ranges) of the agent machines into the whitelist of your Azure Databricks workspace.

    1. If you are using MS hosted agents (Microsoft-hosted agents) to run the pipelines, you can add the service tags (AzureCloud.<region>) of all the possible MS hosted agents within the same Azure geography as your organization into the whitelist. For example, AzureCloud.centralus, AzureCloud.eastus, AzureCloud.westus, etc...

    2. If you are using self-hosted agents that installed on yourself own local machines or VMs, and the IP addresses of the machines will not change often, you can directly add the IP addresses of the machines into the whitelist.