azureconnectivityazure-bicepazure-private-link

Deployment of queue, blob and ADLS2 private endpoints via Bicep goes wrong


I am trying to deploy a number of three azure storage resources under two storage accounts, and I want to implement three private endpoints as to only allow connection to these resources from VMs in the same VNET. Type of resources that need to be connected to in two separate storage accounts are (per storage account):

storageAccountTemp

  1. Azure blob queue
  2. Azure blob storage

storageAccountDatalake

  1. ADLS 2 containers (datalake)

I have the following Azure Bicep code for deploying the permanent and temporary stores:

param location string
param environmentType string
param storageAccountSku string
param privateEndpointsSubnetId string

var privateEndpointNameTmpstBlob = 'pe-tmpst-blob-${environmentType}-001'
var privateEndpointNameTmpstQueue = 'pe-tmpst-queue-{environmentType}-001'
var privateEndpointNamePst = 'pe-pst-${environmentType}-001'

/// Temp storage ///
resource storageAccountTemp 'Microsoft.Storage/storageAccounts@2021-08-01' = {
  name: 'tmpst${environmentType}'
  location: location
  sku: {
    name: storageAccountSku
  }
  kind: 'StorageV2'
  properties: {
    allowBlobPublicAccess: false
    accessTier: 'Hot'
    minimumTlsVersion: 'TLS1_0'
    publicNetworkAccess: 'Disabled'
  }
}

resource blobContainerForQueue 'Microsoft.Storage/storageAccounts/blobServices/containers@2021-08-01' = {
  name: '${storageAccountTemp.name}/default/claimcheck-storage-${environmentType}'
  properties: {
    publicAccess: 'None'
  }
}

resource storageQueueMain 'Microsoft.Storage/storageAccounts/queueServices/queues@2019-06-01' = {
  name: '${storageAccountTemp.name}/default/queue-main-${environmentType}'
}

/// Persistant storage datalake ///
resource storageAccountDatalake 'Microsoft.Storage/storageAccounts@2021-08-01' = {
  name: 'pstdatalake${environmentType}'
  location: location
  sku: {
    name: storageAccountSku
  }
  kind: 'StorageV2'
  properties: {
    allowBlobPublicAccess: false
    accessTier: 'Hot'
    minimumTlsVersion: 'TLS1_0'
    isHnsEnabled: true
    publicNetworkAccess: 'Disabled'
  }
}

/// Data///
resource ContainerForData 'Microsoft.Storage/storageAccounts/blobServices/containers@2021-08-01' = {
  name: '${storageAccountDatalake.name}/default/data-${environmentType}'
  properties: {
    publicAccess: 'None'
  }
}

/// Private endpoints configuration for tempblob, queue and datalake ///
resource privateEndpointTmpstBlob 'Microsoft.Network/privateEndpoints@2021-05-01' = if (environmentType == 'dev' || environmentType == 'prd') {
  name: privateEndpointNameTmpstBlob
  location: location
  properties: {
    subnet: {
      id: privateEndpointsSubnetId
    }
    privateLinkServiceConnections: [
      {
        name: privateEndpointNameTmpstBlob
        properties: {
          privateLinkServiceId: storageAccountTemp.id
          groupIds: ['blob']
        }
      }
    ]
  }
}

resource privateEndpointTmpstQueue 'Microsoft.Network/privateEndpoints@2021-05-01' = if (environmentType == 'dev' || environmentType == 'prd') {
  name: privateEndpointNameTmpstQueue
  location: location
  properties: {
    subnet: {
      id: privateEndpointsSubnetId
    }
    privateLinkServiceConnections: [
      {
        name: privateEndpointNameTmpstQueue
        properties: {
          privateLinkServiceId: storageAccountTemp.id
          groupIds: ['queue']
        }
      }
    ]
  }
}

resource privateEndpointPst 'Microsoft.Network/privateEndpoints@2021-05-01' = if (environmentType == 'dev' || environmentType == 'prd') {
  name: privateEndpointNamePst
  location: location
  properties: {
    subnet: {
      id: privateEndpointsSubnetId
    }
    privateLinkServiceConnections: [
      {
        name: privateEndpointNamePst
        properties: {
          privateLinkServiceId: storageAccountDatalake.id
          groupIds: ['blob']
        }
      }
    ]
  }
}

As you can see, for the storage account, IsHnsEnabled is set to true, as to enable HierarchicalNamespace and thus ADLS2 functionality. The problem is, if I include the privateEndpointPst resource deployment in the Bicep deployment, and then try to view a datalake container in the portal from a VM that is in the same VNET as the private endpoint (which are in the subnet that makes the privateEndpointsSubnetId variable), I get the following message when trying to look at files in one of the datalake containers: enter image description here

I believe it is not the problems in the picture. The reason for this is that when I deploy all three endpoints together, they all show this same problem when trying to look at blob/queue/datalake in storageAccountTemp and storageAccountDatalake when I deploy all three endpoints.

However, only deploying the two endpoints for the storageAccountTemp resources and not the one for Datalake, I can see the data in the portal when running from the VM in the VNET and code running from this VM can also reach the queue + blob. So not only does the deployment of the privateEndpointPst seem to mess up datalake reachability, it also in some way does the same to the reachability of my other queue and blob in the storageAccountTemp if I deploy them altogether. My mind is boggled as to why this is happening, and why I cannot seem to deploy the datalake endpoint in the right way. Also, sometimes, deploying the endpoints altogether WILL make the datalake endpoint work, and break the other two, which is even more mind-boggling. Clicking do you want to do some checks to detect common connectivity issues gives me the following information, which does not make me much wiser as to what is causing the issue (since I'm pretty sure it's not firewalls; sometimes I can access, sometimes not): enter image description here Does anyone see what could be wrong with my Bicep code for deploying the endpoint that might be causing this issue? I'm at quite a loss here. Also tried to replace groupIds: ['blob'] with groupIds: ['dfs'], but that does not seem to solve my problem.


Solution

  • I seem to have found the issue. For connecting to a datalake resource, one needs to have both a private endpoint with groupIds: ['blob'] and groupIds: ['dfs], since the blob API is still used for getting some meta-info about the containers (as far as I can understand).

    So adding:

    resource privateEndpointPstDfs 'Microsoft.Network/privateEndpoints@2021-05-01' = if (environmentType == 'dev' || environmentType == 'prd') {
      name: privateEndpointNamePstDfs
      location: location
      properties: {
        subnet: {
          id: privateEndpointsSubnetId
        }
        privateLinkServiceConnections: [
          {
            name: privateEndpointNamePstDfs
            properties: {
              privateLinkServiceId: storageAccountDatalake.id
              groupIds: ['dfs']
            }
          }
        ]
      }
    }
    

    Made the deployment work successfully.