terraformpolicyopen-policy-agentrego

Using walk to recursively aggregate resources in a terraform state with rego


I'm using Open Policy Agent to write policy against the JSON output of my terraform state.

Here is the structure of the state file:

{
  "format_version": "0.1",
  "terraform_version": "0.12.28",
  "values": {
    "root_module": {
      "resources": [],
      "child_modules": [
        {
          "resources": [],
          "address": "",
          "child_modules": [
            {
              "resources": [],
              "address": "",
              "child_modules": [
                {}
              ]
            }
          ]
        }
      ]
    }
  }
}

I have this nasty rule defined that achieves what I want, but it is obviously not an ideal way to be aggregating these resources.

resources[resource_type] = all {
    some resource_type
    resource_types[resource_type]
    rm := tfstate.values.root_module

    # I think the below can be simplified with the built in "walk" function TODO: do that.
    root_resources := [name |
        name := rm.resources[_]
        name.type == resource_type
    ]

    cmone_resources = [name |
        name := rm.child_modules[_].resources[_]
        name.type == resource_type
    ]

    cmtwo_resources = [name |
        name := rm.child_modules[_].child_modules[_].resources[_]
        name.type == resource_type
    ]

    cm := array.concat(cmone_resources, cmtwo_resources)

    all := array.concat(cm, root_resources)
}

I have read the documentation of the built-in function walk(x, [path, value]). Docs here. I believe this function does what I want it to do, but based on the documentation given and admittedly sparse examples I've found elsewhere, I cannot figure out how to get it to work how I'm expecting.

I've included a playground with a very basic setup and the current rule I have defined. Any help at all would be greatly appreciated.


Solution

  • You're on the right track, and using walk would definitely be a good approach for collecting arbitrarily nested child resources.

    To get started we'll want to explore what walk does. It is going to essentially iterate over all nodes in the object we're walking over, and for each one it gives the "path" and current node value. The path is going to be an array of keys, so like for object:

    {"a": {"b": {"c": 123}}}
    

    if we do a walk over (example below using the opa run REPL:

    > [path, value] = walk({"a": {"b": {"c": 123}}})
    +---------------+-----------------------+
    |     path      |         value         |
    +---------------+-----------------------+
    | []            | {"a":{"b":{"c":123}}} |
    | ["a"]         | {"b":{"c":123}}       |
    | ["a","b"]     | {"c":123}             |
    | ["a","b","c"] | 123                   |
    +---------------+-----------------------+
    

    We see that we have every path and value combination for the values of path and value. You can capture any of these values while iterating in a partial rule (like your resources rule), or in a comprehension.

    So.. taking this over to the terraform stuff. If we modify the playground example to walk over the example input (which is slightly modified to give some unique names to things) we get:

    walk_example[path] = value {
        [path, value] := walk(tfstate)
    }
    

    https://play.openpolicyagent.org/p/2u5shGbrV2

    If you look at the resulting value for walk_example we can see all the paths and values we'd expect to have to handle.

    From there it is a matter of doing the filtering, similar to what you've done in the resources rule, for the resource_types. Instead of doing iteration over the set we will use it as a lookup to check on each type that is OK, and we will build a full set of all resources first (without grouping them by type). The reasoning being that it is very expensive to walk over all nodes of the input json so we want to only do that a single time. We can subsequently walk over the full list of every resource faster with a second pass to group by type (as needed).

    An updated version would look something like:

    walk_resources[resource] {  
        [path, value] := walk(tfstate)
    
        # Attempt to iterate over "resources" of the value, if the key doesn't
        # exist its OK, this iteration for walk will be undefined, and excluded
        # from the results.
        # Note: If you needed to be sure it was a "real" resource, and not some
        # key you can perform additional validation on the path here!
        resource := value.resources[_]
        
        # check if the resource type was contained in the set of desired resource types
        resource_types[resource.type]
    }
    

    https://play.openpolicyagent.org/p/TyqMKDyWyh

    ^ The playground input was updated to include another level of nesting and types on the examples. You can see that the original resources output is missing that depth 3 resource, but the walk_resources set contains all of the expected ones.

    The last part, if you wanted to group them by type, add a complete rule like:

    # list of all resources of a given type. given type must be defined in the resource_types variable above
    resources = { resource_type: resources |
        some resource_type
        resource_types[resource_type]
        resources := { resource | 
            walk_resources[resource]
            resource.type == resource_type
        }
    }
    

    https://play.openpolicyagent.org/p/RlRZwibij9

    Which replaces the original resources rule with a comprehension that is going to iterate over each resource type, and then collect resources matching the type.

    One extra pointer, which I've seen be a problem in these terraform resource helper rules, is that you will want to reference that "complete" rule, see https://www.openpolicyagent.org/docs/latest/policy-language/#complete-definitions for some details on what that means, rather than the "partial" rule (in this case the ones building a set of resources versus assigning the value to the comprehension result). The problem being that internally, as of writing this, OPA will cache values for the "complete" rules whereas the partial rules are not. So if you then go and write a bunch of rules like:

    deny[msg] {
        r := resources["foo"]
        # enforce something for resources of type "foo"...
        ...
    }
    
    deny[msg] {
        r := resources["bar"]
        # enforce something for resources of type "bar"...
        ...
    }
    

    You want to ensure that it is using a cached value for resources each time and not recalculating the set. The original version of your resources rule would suffer from that issue, along with using the walk_resources rule I've shown in those examples. Something to keep an eye on as it can have a pretty dramatic performance impact if you had a large input tfplan.