dockerkubernetesprocessamazon-ekscrashloopbackoff

What causes the error "standard_init_linux.go:228: exec user process caused: bad address" when starting a container?


I'm seeing the error "standard_init_linux.go:228: exec user process caused: bad address" in Pod container logs, in an EKS Kubernetes cluster, and looking for what could cause that? I've tried searching Google and Stackoverflow, but all results that contain "standard_init_linux.go:228: exec user process caused:" are for reasons other than "bad address"--so I haven't found a good explanation (e.g., there's lots of information regarding "exec format error", "permission denied", "no such file or directory", etc. but seemingly nothing regarding "bad address"). The pods with this error were in a CrashloopBackOff state--unable to get started and only that error in the container's log, and the error was happening across various EC2 worker nodes for various applications (i.e., different Docker images). My question is strictly: what could cause this error when it contains "bad address"? The condition went away (across all nodes) when Docker was restarted on one of the nodes that contained some of the crashing pods.


Solution

  • As a long-term consumer of standard_init_linux.go errors :) I was intrigued as I've not seen the bad address before, so I wanted to dig in.

    I searched the https://github/moby/moby repo for the string 'bad address' and found https://github.com/moby/moby/blob/master/vendor/golang.org/x/sys/unix/zerrors_linux_amd64.go#L669 which is the auto-generated error list definition.

    // Error table
    var errorList = [...]struct {
    [snip]
    {13, "EACCES", "permission denied"},
    {14, "EFAULT", "bad address"},
    {15, "ENOTBLK", "block device required"},
    

    which didn't reveal too much, but in context it's clearly a standard failure errno. Looking at the Linux kernel source https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/errno-base.h confirms this.

    Given this info I had a better context for google searching https://google.com/search?q=linux+errno+14+bad+address and it seems very likely you were experiencing a bug in code somewhere. The error seems to be commonly thrown when code is trying to access outside of a valid address space: why this isn't causing a SEGV I don't know. If you're interested it's worth searching 'SEGV versus EFAULT'.

    Given that restarting dockerd resolved this, I think it's likely dockerd got wedged and this is a transient error.