gitazure-devopsazure-storage

Pushing data from Azure Storage Account to Azure DevOps Repo Daily


I've created python script that pulls data structures from SQL DB (Tables, views, indexes, SPs) and saves them on Azure Storage Account in folder with 'DD-MM-YYYY' format.

I want to create a next module that takes these scripts from the folder and pushes them to the Azure DevOps Repo as a commit the new branch that is instantly merged to the main, so I can see daily changes done on the database.

Is it possible to do it on Databricks or I have to use DevOps pipelines? If DevOps is only way, how can I do this?


Solution

  • In pipeline, you can use the Azure CLI "az storage copy" to copy folders from Azure storage account to local directory of the agent machine, and then use git command to commit and push the folders to Azure Repos.

    1. Ensure the latest daily data with folder (e.g., 06-08-2024) has existed in the storage account. In my below example, the folders are stored in a container (daily-sqldb-data) of the storage account.

      enter image description here

    2. In Azure DevOps, I have a git repository (daily-sqldb-data). Go to "Project Settings" > "Repositories", on the "Security" tab of the repository, ensure the following two identities have permission "Contribute" set to "Allow".

      • Project Collection Build Service ({Organization Name})
      • {Project Name} Build Service ({Organization Name})

      enter image description here

    3. In Azure DevOps, go to "Project Settings" > "Service connections" to create an Azure Resource Manager service connection (ARM connection) that can connect and access to the Azure Subscription where the storage account is in.

    4. In pipeline, you can use the AzureCLI@2 task to run the command "az storage copy".

    # azure-pipelines.yml
    
    variables:
      accountName: 'storagebriran'
      containerName: 'daily-sqldb-data'
    
    pool:
      vmImage: ubuntu-latest
    
    steps:
    - checkout: self
      persistCredentials: true
    
    # Create a temporary local branch named as the format 'DD-MM-YYYY' to receive the data downloaded from storage account.
    - task: Bash@3
      displayName: 'Create temporary branch'
      inputs:
        targetType: inline
        script: |
          git config --global user.name "[Auto-Boot]"
          git config --global user.email AutoBoot@example.com
    
          branchName=$(date +"%d-%m-%Y")
          echo "##vso[task.setvariable variable=folderName;]$branchName"
    
          echo "Create '$branchName' branch."
          git branch $branchName
          git checkout $branchName
    
    # Download data folder 'DD-MM-YYYY' from storage account.
    - task: AzureCLI@2
      displayName: 'Download Daily Data'
      inputs:
        azureSubscription: 'myArmConnection'
        scriptType: 'bash'
        scriptLocation: 'inlineScript'
        inlineScript: |
          echo "Download entire folder '$(folderName)' from container '$(containerName)' of Storage account '$(accountName)'."
          az storage copy -s https://$(accountName).blob.core.windows.net/$(containerName)/$(folderName) -d "$(Build.SourcesDirectory)" --recursive
    
    # Merge the download data folder 'DD-MM-YYYY' from the local temporary branch to the local main branch.
    # Commit and push the updates form the local main branch to remote.
    - task: Bash@3
      displayName: 'Merge and Push Daily Data'
      inputs:
        targetType: inline
        script: |
          git add --all
          git commit -m "Add daily data in '$(folderName)' folder."
    
          echo "----------------------------------------------"
          echo "Merge daily data to main branch."
          git checkout main
          git merge $(folderName)
          git branch -d $(folderName)
    
          echo "----------------------------------------------"
          echo "Pushing changes to remote..."
          git add --all
          git commit -m "Merge daily data from '$(folderName)' branch."
          git push
    

    enter image description here

    If you want the pipeline can be automatically triggered daily, you can set Scheduled triggers to it.

    # Disbale CI trigger.
    trigger: none
    
    # Trigger the pipeline at 23:00 UTC everyday. The time zone for cron schedules is UTC.
    # If you want to schedule a time in your time zone, you need to calculate the corresponding UTC time.
    # For example, the time 22:00 in UTC+8 time zone is 14:00 in UTC time zone.
    schedules:
    - cron: 0 23 * * *
      displayName: 'Daily run at 23:00 UTC'
      branches:
        include:
        - main
    
    . . .
    
    steps:
    . . .