I have I have been trying to setup ICP EE in a single node, but I keep getting an installation failure once I get to Deployment of monitoring service task.
This particular task runs for about 30 minutes and then fails. Below is the error log that I got as well.
Is there something I need to do differently?
I used the basic install steps on knowledge center for this.
TASK [monitoring : Deploying monitoring service]
*******************************
fatal: [localhost]: FAILED! => {
"changed":true,
"cmd":"kubectl apply --force --overwrite=true -f /installer/playbook/..//cluster/cfc-components/monitoring/",
"delta":"0:30:37.425771",
"end":"2018-02-26 17:19:04.780643",
"failed":true,
"rc":1,
"start":"2018-02-26 16:48:27.354872",
"stderr":"Error from server: error when creating \"/installer/cluster/cfc-components/monitoring/grafana-router-config.yaml\": timeout\nError from server (Timeout): error when creating \"/installer/cluster/cfc-components/monitoring/kube-state-metrics-deployment.yaml\": the server was unable to return a response in the time allotted, but may still be processing the request (post deployments.extensions)",
"stderr_lines":[
"Error from server: error when creating \"/installer/cluster/cfc-components/monitoring/grafana-router-config.yaml\": timeout",
"Error from server (Timeout): error when creating \"/installer/cluster/cfc-components/monitoring/kube-state-metrics-deployment.yaml\": the server was unable to return a response in the time allotted, but may still be processing the request (post deployments.extensions)"
],
"stdout":"configmap \"alert-rules\" created\nconfigmap \"monitoring-prometheus-alertmanager\" created\ndeployment \"monitoring-prometheus-alertmanager\" created\nconfigmap \"alertmanager-router-nginx-config\" created\nservice \"monitoring-prometheus-alertmanager\" created\ndeployment \"monitoring-exporter\" created\nservice \"monitoring-exporter\" created\nconfigmap \"monitoring-grafana-config\" created\ndeployment \"monitoring-grafana\" created\nconfigmap \"grafana-entry-config\" created\nservice \"monitoring-grafana\" created\njob \"monitoring-grafana-ds\" created\nconfigmap \"grafana-ds-entry-config\" created\nservice \"monitoring-prometheus-kubestatemetrics\" created\ndaemonset \"monitoring-prometheus-nodeexporter-amd64\" created\ndaemonset \"monitoring-prometheus-nodeexporter-ppc64le\" created\ndaemonset \"monitoring-prometheus-nodeexporter-s390x\" created\nservice \"monitoring-prometheus-nodeexporter\" created\nconfigmap \"monitoring-prometheus\" created\ndeployment \"monitoring-prometheus\" created\nconfigmap \"prometheus-router-nginx-config\" created\nservice \"monitoring-prometheus\" created\nconfigmap \"monitoring-router-entry-config\" created",
"stdout_lines":[
"configmap \"alert-rules\" created",
"configmap \"monitoring-prometheus-alertmanager\" created",
"deployment \"monitoring-prometheus-alertmanager\" created",
"configmap \"alertmanager-router-nginx-config\" created",
"service \"monitoring-prometheus-alertmanager\" created",
"deployment \"monitoring-exporter\" created",
"service \"monitoring-exporter\" created",
"configmap \"monitoring-grafana-config\" created",
"deployment \"monitoring-grafana\" created",
"configmap \"grafana-entry-config\" created",
"service \"monitoring-grafana\" created",
"job \"monitoring-grafana-ds\" created",
"configmap \"grafana-ds-entry-config\" created",
"service \"monitoring-prometheus-kubestatemetrics\" created",
"daemonset \"monitoring-prometheus-nodeexporter-amd64\" created",
"daemonset \"monitoring-prometheus-nodeexporter-ppc64le\" created",
"daemonset \"monitoring-prometheus-nodeexporter-s390x\" created",
"service \"monitoring-prometheus-nodeexporter\" created",
"configmap \"monitoring-prometheus\" created",
"deployment \"monitoring-prometheus\" created",
"configmap \"prometheus-router-nginx-config\" created",
"service \"monitoring-prometheus\" created",
"configmap \"monitoring-router-entry-config\" created"
]
}
Does this node have at least 16G of memory (or even 32G)? It may be that the host is overwhelmed by the initial load as pods are coming online.
The second thing to test is what happens when you apply this directory:
You can re-run the same action from the command line:
cd cluster/
kubectl apply --force --overwrite=true -f cfc-components/monitoring/
Then you can introspect behind the scenes what's going on:
kubectl -n kube-system get pod -o wide
journalctl -ru kubelet -o cat | head -n 500 > kubelet-logs.txt
Does the kubelet complain about Docker being unhealthy?
If some pod demonstrates it is unhealthy (above from #1/#2), then describe it and verify if any of the events indicate why it is failing:
kubectl -n kube-system describe pod [failing-pod-name]
If you haven't already configured kubectl
on the host to interact with the system, or if the auth-idp
pod has not yet deployed, you can use the following steps to configure kubectl
:
KUBECONFIG
file in your shell profile (e.g. .bash_profile
) so it applies for each terminal session.docker run -e LICENSE=accept -v /usr/local/bin:/data \
ibmcom/icp-inception:[YOUR_VERSION] \
cp /usr/local/bin/kubectl /data
export KUBECONFIG=/var/lib/kubelet/kubelet-config