yamlyq

Sort kubernetes NetworkPolicy by ports and then by ipBlock using yq


I have the following kubernetes NetworkPolicy file:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  annotations:
    acm_host_group: 10.20.30.40/27,11.20.30.40/27
    acm_manifest_date: "2024-06-14T10:55:53.118658117Z"
    acm_user: SOME_USER
    platform: aws
  name: net-policy-name
  namespace: net-policy-namespace
spec:
  egress:
    - ports:
        - port: "6668"
          protocol: TCP
        - port: "7082"
          protocol: TCP
      to:
        - ipBlock:
            cidr: 22.125.168.151/32
        - ipBlock:
            cidr: 22.75.125.250/32
        - ipBlock:
            cidr: 20.168.200.109/32
        - ipBlock:
            cidr: 22.69.6.27/32
        - ipBlock:
            cidr: 22.102.220.40/32
    - ports:
        - port: "443"
          protocol: TCP
      to:
        - ipBlock:
            cidr: 76.88.32.117/32
        - ipBlock:
            cidr: 76.88.24.104/32
        - ipBlock:
            cidr: 76.88.0.95/32
        - ipBlock:
            cidr: 76.88.0.121/32
        - ipBlock:
            cidr: 22.96.129.17/32
        - ipBlock:
            cidr: 22.96.129.18/32
        - ipBlock:
            cidr: 76.88.24.127/32
        - ipBlock:
            cidr: 22.96.129.19/32
  podSelector: {}
  policyTypes:
    - ingress
    - egress

I would like to order the entries so that the ports are sorted by spec.egress[].ports[].port and then by spec.egress[].ports[].protocol and then the ipBlocks are sorted by cidr. So above would look like this:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  annotations:
    acm_host_group: 10.20.30.40/27,11.20.30.40/27
    acm_manifest_date: "2024-06-14T10:55:53.118658117Z"
    acm_user: SOME_USER
    platform: aws
  name: net-policy-name
  namespace: net-policy-namespace
spec:
  egress:
    - ports:
        - port: "443"
          protocol: TCP
      to:
        - ipBlock:
            cidr: 22.96.129.17/32
        - ipBlock:
            cidr: 22.96.129.18/32
        - ipBlock:
            cidr: 22.96.129.19/32
        - ipBlock:
            cidr: 76.88.0.95/32
        - ipBlock:
            cidr: 76.88.0.121/32
        - ipBlock:
            cidr: 76.88.24.104/32
        - ipBlock:
            cidr: 76.88.24.127/32
        - ipBlock:
            cidr: 76.88.32.117/32
    - ports:
        - port: "6668"
          protocol: TCP
        - port: "7082"
          protocol: TCP
      to:
        - ipBlock:
            cidr: 20.168.200.109/32
        - ipBlock:
            cidr: 22.69.6.27/32
        - ipBlock:
            cidr: 22.75.125.250/32
        - ipBlock:
            cidr: 22.102.220.40/32
        - ipBlock:
            cidr: 22.125.168.151/32
  podSelector: {}
  policyTypes:
    - ingress
    - egress

I ultimately want to be able to compare/diff two files that have their elements in different orders. I've looked at this for a while and can't seem to come up with a decent solution. One option I considered was to convert to properties files (as per https://mikefarah.gitbook.io/yq/usage/tips-and-tricks#comparing-yaml-files) and then use sed/awk/grep to make my tweaks but that's a bit ugly. If anyone is up for a challenge I'd appreciate some guidance.


Solution

  • NOTE A better approach may be to expand the ports and ipBlock's so that you have port*ipBlock for every port and cidr. Otherwise, I worry that presenting the YAML as canonical won't necessary show equivalence.

    I'm unsure whether yq is able to do this; one challenge I have is iterating over the .spec.egress.to arrays and sorting them.

    This should be relatively straightforward in e.g. Python. The following was eye-balled on your example.

    Caveat developer:
    python3 -m venv venv
    source venv/bin/activate
    python3 -m pip install pyyaml
    

    canonical.py:

    import sys
    import yaml
    
    # Parse stdin as YAML
    y = yaml.safe_load(sys.stdin)
    
    # Iterate over .spec.egress
    for egress in y["spec"]["egress"]:
        # Sort .ports by port
        egress["ports"].sort(key=lambda x: x["port"])
        # Sort .to by ipBlock.cidr
        egress["to"].sort(key=lambda x: x["ipBlock"]["cidr"])
    
    # Sort the result by the first ports port
    y["spec"]["egress"].sort(key=lambda x: x["ports"][0]["port"])
    
    # Write the resutlt to stdout
    yaml.dump(y, sys.stdout)
    

    And:

    cat example.yaml > python3 canonical.py
    
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      annotations:
        acm_host_group: 10.20.30.40/27,11.20.30.40/27
        acm_manifest_date: '2024-06-14T10:55:53.118658117Z'
        acm_user: SOME_USER
        platform: aws
      name: net-policy-name
      namespace: net-policy-namespace
    spec:
      egress:
      - ports:
        - port: '443'
          protocol: TCP
        to:
        - ipBlock:
            cidr: 22.96.129.17/32
        - ipBlock:
            cidr: 22.96.129.18/32
        - ipBlock:
            cidr: 22.96.129.19/32
        - ipBlock:
            cidr: 76.88.0.121/32
        - ipBlock:
            cidr: 76.88.0.95/32
        - ipBlock:
            cidr: 76.88.24.104/32
        - ipBlock:
            cidr: 76.88.24.127/32
        - ipBlock:
            cidr: 76.88.32.117/32
      - ports:
        - port: '6668'
          protocol: TCP
        - port: '7082'
          protocol: TCP
        to:
        - ipBlock:
            cidr: 20.168.200.109/32
        - ipBlock:
            cidr: 22.102.220.40/32
        - ipBlock:
            cidr: 22.125.168.151/32
        - ipBlock:
            cidr: 22.69.6.27/32
        - ipBlock:
            cidr: 22.75.125.250/32
      podSelector: {}
      policyTypes:
      - ingress
      - egress