amazon-cloudwatchhashicorp-vaultstatsdtelemetry

Vault Telemetry to CloudWatch


I'm trying to get Vault telemetry streamed through Cloudwatch Agent's StatsD interface into CW metrics, however, the gauge metric values are coming through with prefixes based on the instance ID and tags that are making the metrics impossible to target for IaC managed Cloudwatch alarms.

For instance, the vault.core.unsealed telemetry event is coming through as vault_CLOUDWATCH_AGENT_HOSTNAME_core_unsealed_INSTANCE_NAME instead of the vault_core_unsealed that I was expecting.

Managing the alarms for these metrics using Terraform is impossible because they will have dynamic names and based on whichever instance is determined as the current leader in the cluster which we have no control over.

In the Vault configuration HCL file, I have:

telemetry {
  statsd_address        = "127.0.0.1:8125"
  disable_hostname      = true
  enable_hostname_label = true
}

along with several other combinations of hostname configuration values and they all seem to produce the same output. Is there a solution to this that I'm missing or just a flaw in deciding to use Cloudwatch with StatsD to capture telemetry?


Solution

  • Seemed to have gotten the gauge value names to a usable point with a few non-obvious configuration changes.

    1. In the Vault telemetry stanza, only add the disable_hostname = true property with the StatsD address. Adding the labels in addition will simply move the hostname to a different position in the metric name.

    2. The Cloudwatch agent configuration has an option to omit hostnames which can be toggles by appending of setting a new configuration:

    {
      "agent": {
        "omit_hostname": true
      }
    }
    

    This will prevent the CloudWatch agent from adding its own labels and suffixes to the gauge metric names and clean up some of the naming that is produced

    1. (Optional) Adjust the appended dimensions in the CloudWatch agent configuration. By default, the agent will append the instance ID, image ID, autoscaling group name, and instance type. This may be something you want to keep, however, if you want to do something like IaC created metric alarms, you may need to remove some dimensions to make the metric names targetable (able to be found via direct match). The following can be added to the custom config that will replace the default CloudWatch agent configuration if you want to adjust which dimensions are automatically appended to the incoming telemetry.
    {
      "metrics": {
        "append_dimensions": {
          "AutoScalingGroupName": "${aws:AutoScalingGroupName}"
        }
      }
    }
    

    As long as you know the name of the autoscaling group that the instances are targeted under, the gauge metric names coming in from the Vault telemetry will be named ambiguously enough to target them for IaC purposes.