Ubuntu: 20.04 qemu, Consul: 1.16.1, Vault 1.14.1
I am trying to use Consul to monitor Vault's systemD service, as per https://developer.hashicorp.com/consul/docs/services/usage/checks#osservice-check, but the result is a "not implemented" error.
My consul's service config:
services = [
{
name = "vault"
port = 8200
checks = [
{
http = "vault1.foo.bar.com:8200/sys/health"
interval = "15s"
timeout = "10s"
},
{
name = "Vault Service"
os_service = "vault.service"
interval = "15s"
},
{
name = "Vault gRPC health check"
grpc = "vault1.foo.bar.com:8201"
grpc_use_tls = true
interval = "10s"
},
]
}
]
I've tried several iterations of the example entry, and bare-bones entries with only os_service
and interval
in them. Invariably, I get back near identical logs:
consul[82355]: ==> Starting Consul agent...
consul[82355]: Version: '1.16.1'
consul[82355]: Build Date: '2023-08-05 21:56:29 +0000 UTC'
consul[82355]: Node ID: '3e269875-156b-d7e7-8cfa-2b84c9487ef9'
consul[82355]: Node name: 'vault1'
consul[82355]: Datacenter: 'west' (Segment: '')
consul[82355]: Server: false (Bootstrap: false)
consul[82355]: Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, gRPC-TLS: -1, DNS: 8600)
consul[82355]: Cluster Addr: 10.12.1.94 (LAN: 8301, WAN: 8302)
consul[82355]: Gossip Encryption: true
consul[82355]: Auto-Encrypt-TLS: false
consul[82355]: ACL Enabled: false
consul[82355]: ACL Default Policy: allow
consul[82355]: HTTPS TLS: Verify Incoming: false, Verify Outgoing: false, Min Version: TLSv1_2
consul[82355]: gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2
consul[82355]: Internal RPC TLS: Verify Incoming: false, Verify Outgoing: false (Verify Hostname: false), Min Version: TLSv1_2
consul[82355]: ==> Log data will now stream in as it occurs:
consul[82355]: 2023-08-30T21:46:51.171Z [WARN] agent: skipping file /etc/consul.d/.vault.hcl.swp, extension must be .hcl or .json, or config format must be set
consul[82355]: 2023-08-30T21:46:51.171Z [WARN] agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
consul[82355]: 2023-08-30T21:46:51.184Z [WARN] agent.auto_config: skipping file /etc/consul.d/.vault.hcl.swp, extension must be .hcl or .json, or config format must be set
consul[82355]: 2023-08-30T21:46:51.184Z [WARN] agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
consul[82355]: 2023-08-30T21:46:51.186Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: vault1 10.12.1.94
consul[82355]: 2023-08-30T21:46:51.186Z [INFO] agent.router: Initializing LAN area manager
consul[82355]: 2023-08-30T21:46:51.189Z [WARN] agent.client.serf.lan: serf: Failed to re-join any previously known node
consul[82355]: 2023-08-30T21:46:51.189Z [ERROR] agent: error creating OS Service client: error="not implemented"
consul[82355]: 2023-08-30T21:46:51.190Z [ERROR] agent: Error starting agent: error="Failed to register service \"vault\": not implemented"
consul[82355]: 2023-08-30T21:46:51.190Z [INFO] agent: Exit code: code=1
systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: consul.service: Failed with result 'exit-code'.
systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".
systemd[1]: consul.service: Scheduled restart job, restart counter is at 5.
systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
systemd[1]: consul.service: Start request repeated too quickly.
systemd[1]: consul.service: Failed with result 'exit-code'.
systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".
Googling around for that produces thin results, and reading the code did not enlighten me.
Is this a bug? Pebcak?
I also struggled with this until I did some digging in the Consul source code: https://github.com/hashicorp/consul/blob/ac867d67e8240d64333483fdf3e234399740a189/agent/checks/os_service_unix.go#L15C43-L15C43
type OSServiceClient struct {
}
func NewOSServiceClient() (*OSServiceClient, error) {
return nil, fmt.Errorf("not implemented")
}
func (client *OSServiceClient) Check(serviceName string) error {
return fmt.Errorf("not implemented")
}
It seems it's simply... not implemented. At least for non-windows systems. Interestingly, the documentation indicates it is available for systemd units.
As a workaround, until it is made available, you can always execute systemctl is-active vault.service
like so:
{
name = "Vault Service"
args = [
"systemctl",
"is-active",
"vault.service",
]
interval = "15s"
},