I'm trying to get Vault telemetry streamed through Cloudwatch Agent's StatsD interface into CW metrics, however, the gauge metric values are coming through with prefixes based on the instance ID and tags that are making the metrics impossible to target for IaC managed Cloudwatch alarms.
For instance, the vault.core.unsealed
telemetry event is coming through as vault_CLOUDWATCH_AGENT_HOSTNAME_core_unsealed_INSTANCE_NAME
instead of the vault_core_unsealed
that I was expecting.
Managing the alarms for these metrics using Terraform is impossible because they will have dynamic names and based on whichever instance is determined as the current leader in the cluster which we have no control over.
In the Vault configuration HCL file, I have:
telemetry {
statsd_address = "127.0.0.1:8125"
disable_hostname = true
enable_hostname_label = true
}
along with several other combinations of hostname configuration values and they all seem to produce the same output. Is there a solution to this that I'm missing or just a flaw in deciding to use Cloudwatch with StatsD to capture telemetry?
Seemed to have gotten the gauge value names to a usable point with a few non-obvious configuration changes.
In the Vault telemetry stanza, only add the disable_hostname = true
property with the StatsD address. Adding the labels in addition will simply move the hostname to a different position in the metric name.
The Cloudwatch agent configuration has an option to omit hostnames which can be toggles by appending of setting a new configuration:
{
"agent": {
"omit_hostname": true
}
}
This will prevent the CloudWatch agent from adding its own labels and suffixes to the gauge metric names and clean up some of the naming that is produced
{
"metrics": {
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}"
}
}
}
As long as you know the name of the autoscaling group that the instances are targeted under, the gauge metric names coming in from the Vault telemetry will be named ambiguously enough to target them for IaC purposes.