google-cloud-platformterraform-provider-gcpcloud-init

fs_setup and container race in GCP


I am using GCP WM Instance with Container Optimized OS to run a container image. I use cloud-init to initialize storage. When I initialize multiple drives, for example several local ssd, then it takes some time and docker container starts before cloud-init is complete initializing drives. Which means I have race condition. Notice that docker container has started before file system has been initialized.

Apr 27 16:30:44 ch-s01r1 systemd[1]: Started docker-xxxxxx
...
2023-04-27 16:30:52,770 - subp.py[DEBUG]: Running command mkfs.ext4 -L ssd1 -m 0 ...

In Linux such problems are solved with adding dependencies between systemd services, but I can not find any documentation how to add systemd dependency either in cloud-init nor GCP Container Optimized OS.

In terraform I have:

resource "google_compute_instance" "clickhouse-server" {
...
  metadata = {
    gce-container-declaration = module.gce-container.metadata_value
    user-data                 = data.cloudinit_config.clickhouse_config[count.index].rendered
  }

As I understand, those 2 parts start racing: user-data will cause file system initialization and gce-container-declaration will trigger container start up.

How do I ensure that container is not being started before cloud-init is complete?

My cloud-init file (I use terraform to expand macros):

...
fs_setup:
  - label: log
    filesystem: ext4
    device: log
    partition: auto
    cmd: mkfs.ext4 -L %(label)s -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard %(device)s
  - label: data
    filesystem: ext4
    device: data
    partition: auto
    cmd: mkfs.ext4 -L %(label)s -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard %(device)s
  %{~ for n in range(ssd_count) ~}
  - label: ssd${n}
    filesystem: ext4
    device: ssd${n}
    partition: auto
    cmd: mkfs.ext4 -L %(label)s -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard %(device)s
  %{~ endfor ~}

Solution

  • Answering my own question. Solution is in giving up on terraform's gce-container-declaration generation and doing it manually in cloud-init-only. Hint comes from GCP documentation Container-Optimized OS/Running containers on instances. It calls for crating a file for systemd service which would start docker container and starting docker from cloud-init itself as in:

    #cloud-config
    
    write_files:
    - path: /etc/systemd/system/cloudservice.service
      content: |
        [Service]
        ExecStart=/usr/bin/docker run --rm --name=mycloudservice gcr.io/google-containers/busybox:latest /bin/sleep 3600
    

    With file system creation and docker being started by cloud-init itself, there is no more racing between scripts initialized from gce-container-declaration and user-data.

    Final solution (with comments inside) looks like this (my-server.init.yml):

    # Execute on every boot: mound file systems and start container
    runcmd:
      - mount /dev/disk/by-id/google-log /mnt/disks/log
      - mount -o discard,defaults,nobarrier /dev/md0 /mnt/disks/ssd
      - systemctl daemon-reload
      - systemctl start my.service
    write_files:
      # Execute only first time when instance is created: create RAID array of ssd drives and format file systems
      - path: /var/lib/cloud/scripts/per-instance/fs-prepare.sh
        permissions: 0544
        content: |
          #!/bin/bash
          
          mkfs.ext4 -L log -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/disk/by-id/google-log
          mkdir -p /mnt/disks/log
          
          mdadm --create /dev/md0 --level=0 --raid-devices=${ssd_count} %{ for n in range(ssd_count) } /dev/disk/by-id/google-local-nvme-ssd-${n} %{ endfor }
          mkfs.ext4 -L ssd -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/md0
          mkdir -p /mnt/disks/ssd
      # Systemd service descriptor which will start (and restart) container
      - path: /etc/systemd/system/my.service
        content: |
          [Unit]
          Description=Start my docker container
          
          [Service]
          ExecStart=/usr/bin/docker run --rm --name=my-server -p 9000:9000 -p 8123:8123 -p 9009:9009 ${mounts} ${my_server_image}  
          ExecStop=/usr/bin/docker stop my-server
          ExecStopPost=/usr/bin/docker rm my-server
          Restart=on-failure
    

    In Terraform's main.tf the following changes are required (notice commented out gce-container-declaration):

    locals {
      mounts = [
        "-v /var/lib/my-server/config.d:/etc/my-server/config.d:ro",
        "-v /var/lib/my-server/users.d:/etc/my-server/users.d:ro",
        "-v /mnt/disks/ssd:/var/lib/my-data",
        "-v /mnt/disks/log:/var/log/my-server"
      ]
    }
    
    resource "google_compute_instance" "my-server" {
    ...
      attached_disk {
        source      = google_compute_disk.my-log-disk[count.index].self_link
        device_name = "log"
      }
    
      dynamic "scratch_disk" {
        for_each = range(var.ssd_count)
        content {
          interface = "NVME"
        }
      }
    
      metadata = {
    #    gce-container-declaration = module.gce-container.metadata_value
        user-data                 = data.cloudinit_config.my_config[count.index].rendered
      }
    
    }
    
    data "cloudinit_config" "my_config" {
    ...
      part {
        content_type = "text/cloud-config"
        content = templatefile("${path.module}/my-server.init.yml", {
          my_server_image = var.my_server_image
          ssd_count = var.ssd_count
          mounts = join(" ", local.mounts)
        })
        filename = "my-server.init.yml"
      }
    }
    
    # Just to retrieve Container Optimized OS image name,
    # DO NOT use to render `google_compute_instance.metadata.gce-container-declaration`
    # because it will cause a race race between container start and cloud-init file system 
    # initialization
    module "gce-container" {
      source  = "terraform-google-modules/container-vm/google"
      version = "3.1.0"
    }