I am using GCP WM Instance with Container Optimized OS to run a container image. I use cloud-init
to initialize storage. When I initialize multiple drives, for example several local ssd, then it takes some time and docker container starts before cloud-init
is complete initializing drives. Which means I have race condition. Notice that docker container has started before file system has been initialized.
Apr 27 16:30:44 ch-s01r1 systemd[1]: Started docker-xxxxxx
...
2023-04-27 16:30:52,770 - subp.py[DEBUG]: Running command mkfs.ext4 -L ssd1 -m 0 ...
In Linux such problems are solved with adding dependencies between systemd
services, but I can not find any documentation how to add systemd
dependency either in cloud-init
nor GCP Container Optimized OS.
In terraform I have:
resource "google_compute_instance" "clickhouse-server" {
...
metadata = {
gce-container-declaration = module.gce-container.metadata_value
user-data = data.cloudinit_config.clickhouse_config[count.index].rendered
}
As I understand, those 2 parts start racing: user-data
will cause file system initialization and gce-container-declaration
will trigger container start up.
How do I ensure that container is not being started before cloud-init
is complete?
My cloud-init
file (I use terraform to expand macros):
...
fs_setup:
- label: log
filesystem: ext4
device: log
partition: auto
cmd: mkfs.ext4 -L %(label)s -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard %(device)s
- label: data
filesystem: ext4
device: data
partition: auto
cmd: mkfs.ext4 -L %(label)s -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard %(device)s
%{~ for n in range(ssd_count) ~}
- label: ssd${n}
filesystem: ext4
device: ssd${n}
partition: auto
cmd: mkfs.ext4 -L %(label)s -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard %(device)s
%{~ endfor ~}
Answering my own question. Solution is in giving up on terraform
's gce-container-declaration
generation and doing it manually in cloud-init
-only. Hint comes from GCP documentation Container-Optimized OS/Running containers on instances. It calls for crating a file for systemd
service which would start docker container and starting docker from cloud-init
itself as in:
#cloud-config
write_files:
- path: /etc/systemd/system/cloudservice.service
content: |
[Service]
ExecStart=/usr/bin/docker run --rm --name=mycloudservice gcr.io/google-containers/busybox:latest /bin/sleep 3600
With file system creation and docker being started by cloud-init
itself, there is no more racing between scripts initialized from gce-container-declaration
and user-data
.
Final solution (with comments inside) looks like this (my-server.init.yml
):
# Execute on every boot: mound file systems and start container
runcmd:
- mount /dev/disk/by-id/google-log /mnt/disks/log
- mount -o discard,defaults,nobarrier /dev/md0 /mnt/disks/ssd
- systemctl daemon-reload
- systemctl start my.service
write_files:
# Execute only first time when instance is created: create RAID array of ssd drives and format file systems
- path: /var/lib/cloud/scripts/per-instance/fs-prepare.sh
permissions: 0544
content: |
#!/bin/bash
mkfs.ext4 -L log -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/disk/by-id/google-log
mkdir -p /mnt/disks/log
mdadm --create /dev/md0 --level=0 --raid-devices=${ssd_count} %{ for n in range(ssd_count) } /dev/disk/by-id/google-local-nvme-ssd-${n} %{ endfor }
mkfs.ext4 -L ssd -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/md0
mkdir -p /mnt/disks/ssd
# Systemd service descriptor which will start (and restart) container
- path: /etc/systemd/system/my.service
content: |
[Unit]
Description=Start my docker container
[Service]
ExecStart=/usr/bin/docker run --rm --name=my-server -p 9000:9000 -p 8123:8123 -p 9009:9009 ${mounts} ${my_server_image}
ExecStop=/usr/bin/docker stop my-server
ExecStopPost=/usr/bin/docker rm my-server
Restart=on-failure
In Terraform's main.tf
the following changes are required (notice commented out gce-container-declaration
):
locals {
mounts = [
"-v /var/lib/my-server/config.d:/etc/my-server/config.d:ro",
"-v /var/lib/my-server/users.d:/etc/my-server/users.d:ro",
"-v /mnt/disks/ssd:/var/lib/my-data",
"-v /mnt/disks/log:/var/log/my-server"
]
}
resource "google_compute_instance" "my-server" {
...
attached_disk {
source = google_compute_disk.my-log-disk[count.index].self_link
device_name = "log"
}
dynamic "scratch_disk" {
for_each = range(var.ssd_count)
content {
interface = "NVME"
}
}
metadata = {
# gce-container-declaration = module.gce-container.metadata_value
user-data = data.cloudinit_config.my_config[count.index].rendered
}
}
data "cloudinit_config" "my_config" {
...
part {
content_type = "text/cloud-config"
content = templatefile("${path.module}/my-server.init.yml", {
my_server_image = var.my_server_image
ssd_count = var.ssd_count
mounts = join(" ", local.mounts)
})
filename = "my-server.init.yml"
}
}
# Just to retrieve Container Optimized OS image name,
# DO NOT use to render `google_compute_instance.metadata.gce-container-declaration`
# because it will cause a race race between container start and cloud-init file system
# initialization
module "gce-container" {
source = "terraform-google-modules/container-vm/google"
version = "3.1.0"
}