pythonyamlpyyaml

Deleting content from a YAML file using Python while retaining the original structure


I have a YAML file. I want to use my script to repo all "repository" instances that are not contained in a list of strings I have defined. My script:

import yaml

core_repos = ["REPO1",
              "REPO2"]

if __name__ == "__main__":
    yml_file_name = "azure-pipelines.yml"
    with open(yml_file_name, 'r') as yml_file:
        yml_content = yaml.safe_load(yml_file)
    repositories = yml_content.get("resources", {}).get("repositories", [])
    filtered_repositories = [repo for repo in repositories if repo.get("repository") in core_repos]
    yml_content["resources"]["repositories"] = filtered_repositories

    with open(yml_file_name, 'w') as f:
        yaml.safe_dump(yml_content, f, default_flow_style=False)

The original:

trigger:
  - release/test

pool:
  name: <REDACTED>-Linux
  demands:
    - agent.name -equals  <REDACTED>

# Overrides the value for Build.BuildNumber, which is used to name the artifact (ZIP file) that is produced
name: '$(Date:yyyyMMdd)T$(Hours)$(Minutes)$(Seconds)'

resources:
  repositories:
    - repository: REPO1
      type: git
      ref: release/test
      name: <REDACTED>/REPO1
      trigger:
        branches:
          include:
            - release/test

    - repository: REPO2
      type: git
      ref: release/test
      name: <REDACTED>/REPO2
      trigger:
        branches:
          include:
            - release/test

    - repository: REPO3
      type: git
      ref: release/test
      name: <REDACTED>/REPO3
      trigger:
        branches:
          include:
            - release/test

    - repository: REPO4
      type: git
      ref: release/test
      name: <REDACTED>/REPO4
      trigger:
        branches:
          include:
            - release/test

stages:
  - stage: 'BuildAndUploadArtifact'
    jobs:
      - job:
        workspace:
          clean: all
        steps:
          - checkout: self
          # Core repos
          - checkout: REPO1
          - checkout: REPO2
          - checkout: REPO3
          - checkout: REPO4

After running the script my main goal seems to have been accomplished, but the output otherwise looks very wrong in several cases. The trigger ended up at the bottom and my comment is missing completely, to name a few things. What is causing this?

name: $(Date:yyyyMMdd)T$(Hours)$(Minutes)$(Seconds)
pool:
  demands:
  - agent.name -equals  <REDACTED>
  name: <REDACTED>-Linux
resources:
  repositories:
  - name: <REDACTED>/REPO1
    ref: release/test
    repository: REPO1
    trigger:
      branches:
        include:
        - release/test
    type: git
  - name: <REDACTED>/REPO2
    ref: release/test
    repository: REPO2
    trigger:
      branches:
        include:
        - release/test
    type: git
stages:
- jobs:
  - job: null
    steps:
    - checkout: self
    - checkout: REPO1
    - checkout: REPO2
    - checkout: REPO3
    - checkout: REPO4
    workspace:
      clean: all
  stage: BuildAndUploadArtifact
trigger:
- release/test

Solution

  • I fixed it myself, this works using ruamel.yaml:

    from ruamel.yaml import YAML
    
    core_repos = ["REPO1", "REPO2"]
    
    if __name__ == "__main__":
        yml_file_name = "azure-pipelines.yml"
    
        yaml = YAML()
        yaml.preserve_quotes = True
        with open(yml_file_name, 'rb') as yml_file:
            yml_content = yaml.load(yml_file)
    
            repositories = yml_content.get("resources", {}).get("repositories", [])
            filtered_repositories = [repo for repo in repositories if repo.get("repository") in core_repos]
            yml_content["resources"]["repositories"] = filtered_repositories
    
        with open(yml_file_name, 'wb') as f:
            yaml.dump(yml_content, f)