pythongoogle-app-enginegoogle-cloud-platformmanaged-vmgoogle-managed-vm

Google managed VM module stuck in reboot loop


I've been trying to add a new App Engine Module that uses Managed VMs instead of the default GAE sandbox. The intent is to provide a module where I can run newer versions of SciPy and NumPy, which can be called by my user-facing Module. I have successfully built and run my Docker images/containers locally, but have run into a lot of issues when I try to deploy to a custom version on the Google servers.

The following is from the serial console output for the Managed VM module instance, which continues to reboot due to issues that appear to be beyond my control.

Has anyone else run into these? Did I miss something during the configuration/deployment process?

FWIW: I've used GAE for several years and even contributed to it during my time at Google. I also have experience using Modules, as well as with Docker. The documentation and tooling around Managed VMs seems to be quite immature at the moment, and I've run out of steam trying to fight it. I need help.

Dec 09 00:52:41 vm_runtime_init: start 'pull_app'.
[   24.288054] docker0: port 1(veth8d67b7c) entered forwarding state
Dec  9 00:52:56 gae-mvm-vmv7-tsia kernel: [   24.288054] docker0: port 1(veth8d67b7c) entered forwarding state
Dec 09 00:52:57 Pulling GAE_FULL_APP_CONTAINER: appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
gcm-Heartbeat:1449622390000
Dec 09 00:53:13 ERROR: Timed out while trying to pull appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7 from registry!
===== Unexpected error during VM startup =====
=== Dump of VM runtime system logs follows ===
WARNING: HTTP 404 error while fetching metadata key gae_cloud_sql_instances. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_cloud_sql_proxy_image_name. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_extra_nginx_confs. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_redirect_appengine_googleapis_com. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_http_loadbalancer_enabled. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_loadbalancer. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_loadbalancer_ip. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_memcache_proxy. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_monitoring_image_name. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_use_cloud_monitoring. Will treat it as an empty string.
vm_runtime_init: Dec 09 00:52:40 Invoking all VM runtime components. /dev/fd/63
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: start 'allow_ssh'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: Done start 'allow_ssh'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: start 'unlocker'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: Done start 'unlocker'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: start 'fluentd_logger'.
vm_runtime_init: Dec 09 00:52:41 vm_runtime_init: Done start 'fluentd_logger'.
vm_runtime_init: Dec 09 00:52:41 vm_runtime_init: start 'pull_app'.
vm_runtime_init: Dec 09 00:52:57 Pulling GAE_FULL_APP_CONTAINER: appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Dec 09 00:53:13 ERROR: Timed out while trying to pull appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7 from registry!
Dec 09 00:52:40 Invoking all VM runtime components. /dev/fd/63
Dec 09 00:52:40 vm_runtime_init: start 'allow_ssh'.
Dec 09 00:52:40 vm_runtime_init: Done start 'allow_ssh'.
Dec 09 00:52:40 vm_runtime_init: start 'unlocker'.
Dec 09 00:52:40 vm_runtime_init: Done start 'unlocker'.
Dec 09 00:52:40 vm_runtime_init: start 'fluentd_logger'.
8aa8a33b8daa451d5595b951aeecad772a23d65b6592ac07cae6265cc74b6312
Dec 09 00:52:41 vm_runtime_init: Done start 'fluentd_logger'.
Dec 09 00:52:41 vm_runtime_init: start 'pull_app'.
Dec 09 00:52:57 Pulling GAE_FULL_APP_CONTAINER: appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Pulling metadata
16d49c9e1091: Pulling fs layer
16d49c9e1091: Download complete
54f405e77b26: Pulling metadata
54f405e77b26: Pulling fs layer
54f405e77b26: Download complete
36e2f6c710be: Pulling metadata
36e2f6c710be: Pulling fs layer
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
8aa8a33b8daa        gcr.io/google_appengine/fluentd-logger   "/opt/google-fluentd/"   33 seconds ago      Up 32 seconds                           insane_panini
Container: 8aa8a33b8daa
========= rebooting. ========================

INIT: 
INIT: Sending processes the TERM signal


INIT: Sending processes the KILL signal

Dec  9 00:53:14 gae-mvm-vmv7-tsia init: Switching to runlevel: 1
gcm-StatusUpdate:TIME=1449622394000;STATUS=COMMAND_FAILED;INVOCATION_ID=0
[[36minfo[39;49m] Using makefile-style concurrent boot in runlevel 1.
Dec  9 00:53:15 gae-mvm-vmv7-tsia rpc.statd[1758]: Caught signal 15, un-registering and exiting
Dec  9 00:53:15 gae-mvm-vmv7-tsia google: shutdown script found in metadata.
[....] Stopping NFS common utilities: idmapd statd[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Dec  9 00:53:15 gae-mvm-vmv7-tsia shutdownscript: Running shutdown script /var/run/google.shutdown.script
Dec  9 00:53:15 gae-mvm-vmv7-tsia rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
[....] Stopping rpcbind daemon...[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Stopping supervisor: supervisord.
udhcpd: Disabled. Edit /etc/default/udhcpd to enable it.
[....] Unmounting iscsi-backed filesystems: Unmounting all devices marked _netdev[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Dec  9 00:53:16 gae-mvm-vmv7-tsia iscsid: iscsid shutting down.
[....] Unmounting iscsi-backed filesystems: Unmounting all devices marked _netdev[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Disconnecting iSCSI targets:iscsiadm: No matching sessions found
[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Stopping iSCSI initiator service:[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Dec  9 00:53:16 gae-mvm-vmv7-tsia shutdownscript: Finished running shutdown script /var/run/google.shutdown.script
[....] Stopping Docker: docker[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Stopping The Kubernetes container manager: kubelet[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Stopping enhanced syslogd: rsyslogd[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[   54.500923] docker0: port 1(veth8d67b7c) entered disabled state
[   54.512521] docker0: port 1(veth8d67b7c) entered disabled state
[   54.522249] device veth8d67b7c left promiscuous mode
[   54.527554] docker0: port 1(veth8d67b7c) entered disabled state
Terminating on signal number 15
Traceback (most recent call last):
  File "/usr/share/google/google_daemon/manage_accounts.py", line 94, in <module>
    options.daemon, options.force, options.debug)
  File "/usr/share/google/google_daemon/manage_accounts.py", line 65, in Main
    manager_daemon.StartDaemon()
  File "/usr/share/google/google_daemon/accounts_manager_daemon.py", line 73, in StartDaemon
    self.accounts_manager.Main()
  File "/usr/share/google/google_daemon/accounts_manager.py", line 87, in Main
    writer.close()
IOError: [Errno 32] Broken pipe
[....] Asking all remaining processes to terminate...acpid: exiting

[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0cdone.
[....] All processes ended within 1 seconds....[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0cdone.
[[36minfo[3
INIT: Sending processes the TERM signal


INIT: Sending processes the KILL signal

sulogin: root account is locked, starting shell
root@gae-mvm-vmv7-tsia:~# 

Edit: Additional information from the shutdown.log is below. The docker logs command is not being run in any of my code or Dockerfiles -- I assume there is a bug in the way Google is using it on their end.

2015-12-08 17:08:22.194 Sending SIGUSR1 to fluentd to trigger a log flush.
2015-12-08 17:08:22.194 605e9f1ad747e63560fdc28a8c7f3c276d77255edd0a65381f7ad2f9f8eafd2a
2015-12-08 17:08:22.194 ---------------------------------------------------------------------
2015-12-08 17:08:22.194 ---------------App was unhealthy, grabbing debug logs----------------
2015-12-08 17:08:22.194 --------------------------App stdout/stderr--------------------------
2015-12-08 17:08:22.194 /usr/share/vm_runtime/vm_shutdown.sh: line 22: /var/run/app.cid: No such file or directory
2015-12-08 17:08:22.194 docker: "logs" requires 1 argument.
2015-12-08 17:08:22.194 See 'docker logs --help'.
2015-12-08 17:08:22.194 
2015-12-08 17:08:22.194 Usage:  docker logs [OPTIONS] CONTAINER
2015-12-08 17:08:22.194 
2015-12-08 17:08:22.194 Fetch the logs of a container
2015-12-08 17:08:22.194 ---------------------------------------------------------------------
2015-12-08 17:08:22.194 --------------------------Tail of app logs---------------------------
2015-12-08 17:08:22.194 tail: cannot open `/var/log/app_engine/app/app.0.log.json' for reading: No such file or directory
2015-12-08 17:08:22.194 ---------------------------------------------------------------------

Solution

  • EDIT

    This answer is mostly correct, but inferring that the image doesn't exist based on curl was flawed. python-compat should work for you, but python is also a valid image, as can be seen from running docker pull gcr.io/google_appengine/python:

    $ docker pull gcr.io/google_appengine/python
    Pulling repository gcr.io/google_appengine/python
    ac7db0912786: Download complete 
    643a001c5ee0: Download complete 
    559718b5f880: Download complete 
    8f8068a6a6b4: Download complete 
    16d49c9e1091: Download complete 
    54f405e77b26: Download complete 
    36e2f6c710be: Download complete 
    e8aed8091139: Download complete 
    8f0415d8e4e9: Download complete 
    15ed20635873: Download complete 
    6d70c8850a43: Download complete 
    93ae290c32a1: Download complete 
    7f766358fa71: Download complete 
    7f4a74c30dc4: Download complete 
    b51802c69e61: Download complete 
    Status: Downloaded newer image for gcr.io/google_appengine/python:latest
    

    In discussion with contributor jonparrott over at the github repo for the gcr.io/google_appengine/python docker image, the relation of the various python Managed VM / Custom Runtime docker images was clarified in a comment.

    So, I think the issue you're seeing here, given that the VM fails to become healthy, could be related to the image you sourced from, your Dockerfile, your app code, or infrastructure. The first 3 are more likely than the last one, but it's not inconceivable. It seems the "Untar re-exec error: exit status 1: output: unexpected EOF" errors are the first outward manifestation of the problem during the serial console output.

    This might be worth putting into an issue report for the Google Cloud Platform Public Issue Tracker along with information like your Dockerfile, your app's code if necessary, time-frames if it only occurred transiently, etc.


    Original Answer

    If you check the container registry, the image you're attempting to source FROM doesn't exist:

    curl -v -X HEAD https://gcr.io/google_appengine/python
    * Hostname was NOT found in DNS cache
    *   Trying 74.125.193.82...
    * Connected to gcr.io (74.125.193.82) port 443 (#0)
    * successfully set certificate verify locations:
    *   CAfile: none
      CApath: /etc/ssl/certs
    * SSLv3, TLS handshake, Client hello (1):
    * SSLv3, TLS handshake, Server hello (2):
    * SSLv3, TLS handshake, CERT (11):
    * SSLv3, TLS handshake, Server key exchange (12):
    * SSLv3, TLS handshake, Server finished (14):
    * SSLv3, TLS handshake, Client key exchange (16):
    * SSLv3, TLS change cipher, Client hello (1):
    * SSLv3, TLS handshake, Finished (20):
    * SSLv3, TLS change cipher, Client hello (1):
    * SSLv3, TLS handshake, Finished (20):
    * SSL connection using ECDHE-RSA-AES128-GCM-SHA256
    * Server certificate:
    *        subject: C=US; ST=California; L=Mountain View; O=Google Inc; CN=*.googlecode.com
    *        start date: 2015-12-02 14:40:36 GMT
    *        expire date: 2016-03-01 00:00:00 GMT
    *        subjectAltName: gcr.io matched
    *        issuer: C=US; O=Google Inc; CN=Google Internet Authority G2
    *        SSL certificate verify ok.
    > HEAD /google_appengine/python HTTP/1.1
    > User-Agent: curl/7.35.0
    > Host: gcr.io
    > Accept: */*
    > 
    < HTTP/1.1 404 Not Found
    < Date: Thu, 10 Dec 2015 19:14:52 GMT
    < Content-Type: text/html; charset=UTF-8
    * Server Docker Registry is not blacklisted
    < Server: Docker Registry
    < Content-Length: 1584
    < X-XSS-Protection: 1; mode=block
    < X-Frame-Options: SAMEORIGIN
    < Alternate-Protocol: 443:quic,p=0
    < Alt-Svc: clear
    < 
    

    The actually-existing container registry image for the python runtime is:

    gcr.io/google_appengine/python-compat
    

    This should probably be the key to solving your deployment failures.