azureazure-devopsazure-virtual-machineazure-cli

Azure VM run-commands freeze and hang and starts to block DevOps pipeline from running new run-commands on VM with Azure CLI


I'm running couple of PowerShell scripts as part of our DevOps pipelines on Windows Azure VMs with az vm run-command create .. tool. Sometimes the commands freeze and won't complete in decent time. When this happens I'm not able to execute those commands anymore and it starts to block the devops pipeline runs. ...and also if I try to delete the run-command with az vm run-command delete .. it also freezes.

Example of run-command execution (and delete) as part of devops pipeline:

# Execute the script on VM with run-command
az vm run-command create \
    --name RecreateBinariesFolder$(Build.BuildId) \
    --vm-name ${{ parameters.vmName }} \
    --resource-group my-group-${{ parameters.environment }} \
    --timeout-in-seconds 600 \
    --script "if(Test-Path C:\\temp\\packages\\){ Remove-Item -Path c:\\temp\\packages -Force -Recurse } ; if(!(Test-Path C:\\temp\\packages\\)){ mkdir c:\\temp\\packages }"
# Delete the run-command from VM
az vm run-command delete \
    --name RecreateBinariesFolder$(Build.BuildId) \
    --vm-name ${{ parameters.vmName }} \
    --resource-group my-group-${{ parameters.environment }} \
    --yes

It works well usually on fresh VM but then executions start to freeze after awhile in later pipeline runs.

Any way to execute run-commands in more stabile way or are there any other handy ways to run commands on Windows VMs more easily and in more stabile manner?


Update 2024-01-23: After re-creating the VMs and starting to run the delete command with --no-wait option we haven't noticed similar freezing anymore after tens of pipeline runs. There was most likely some unexpected issue with the VM itself or the CustomScriptExtension that runs the commands on the VM.


Update 2024-03-12: We also started to delete all existing Run Commands from all the VMs in selected resource group before we run any new ones. This has helped in some cases as pointed out in the selected answer.

RG_NAME=my-resource-group
VM_NAMES=$(az vm list -o json|jq -r '.[].name')
for VM_NAME in $VM_NAMES; do
  VM_NAME=$(echo $VM_NAME|tr -d '\r\n')
  RUN_COMMANDS=$(az vm run-command list --vm-name $VM_NAME --resource-group $RG_NAME -o json)
  for CMD_NAME in $(echo $RUN_COMMANDS|jq -r '.[].name'); do
    CMD_NAME=$(echo $CMD_NAME|tr -d '\r\n')
    az vm run-command delete --name $CMD_NAME --vm-name $VM_NAME --resource-group $RG_NAME --yes --no-wait
  done
done

Solution

  • Solution: After re-creating the VMs and starting to run the delete command with --no-wait option, they haven't noticed similar freezing anymore.

    If you have the same issue, you can try the followings to narrow down the issue.

    1. Use MS-hosted agent to run the pipeline. If it has the same issue, it may be related to the agent or your scripts.
    2. Run the same commands multiple times on the affected VM directly in the Azure portal.
    3. Create a new VM, run the same commands multiple times on it directly in the Azure portal. If the old VM freezes but the new one works, the issue may be related to the old VM itself. If both the old and new VM work, but freezes when used in the pipeline, the issue may be related to the agent.