I am trying to import a pipeline into streamsets, during container start up, by using the Docker CMD command in Dockerfile. The image builds, but while creating the container there is no error but it exits with code 0. So it never comes up. Here is what I did:
Dockerfile:
FROM streamsets/datacollector:3.18.1
COPY myPipeline.json /pipelinejsonlocation/
EXPOSE 18630
ENTRYPOINT ["/bin/sh"]
CMD ["/opt/streamsets-datacollector-3.18.1/bin/streamsets","cli","-U", "http://localhost:18630", \
"-u", \
"admin", \
"-p", \
"admin", \
"store", \
"import", \
"-n", \
"myPipeline", \
"--stack", \
"-f", \
"/pipelinejsonlocation/myPipeline.json"]
Build image:
docker build -t cmp/sdc .
Run image:
docker run -p 18630:18630 -d --name sdc cmp/sdc
This outputs the container id. But the container is in the Exited status as shown below.
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
537adb1b05ab cmp/sdc "/bin/sh /opt/stream…" 5 seconds ago Exited (0) 3 seconds ago sdc
When I do not specify the CMD command in the Dockerfile, the streamsets container spins up and then when I run the streamsets import command in the running container in shell, it works. But how do I get it done during provisioning itself? Is there something I am missing in the Dockerfile?
In your Dockerfile you overwrite the default CMD
and ENTRYPOINT
from the StreamSets Data Collector Dockerfile. So the container only executes your command during startup and exits without errors afterwards. This is the reason why your container is in Exited (0)
status.
In general this is good and expected behavior. If you want to keep your container alive you need to execute another command in the foreground, which never ends. But unfortunately, you cannot run multiple CMD
s in your docker file.
I dug a little deeper. The default entry point of the image is ENTRYPOINT ["/docker-entrypoint.sh"]
. This script sets up a few things and starts the Data Collector.
It is required that the Data Collector is running before the pipeline is imported. So a solution could be to copy the default docker-entrypoint.sh and modify it to start the Data Collector and import the pipeline afterwards. You could to it like this:
Dockerfile:
FROM streamsets/datacollector:3.18.1
COPY myPipeline.json /pipelinejsonlocation/
# Replace docker-entrypoint.sh
COPY docker-entrypoint.sh /docker-entrypoint.sh
EXPOSE 18630
docker-entrypoint.sh (https://github.com/streamsets/datacollector-docker/blob/master/docker-entrypoint.sh):
#!/bin/bash
#
# Copyright 2017 StreamSets Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -e
# We translate environment variables to sdc.properties and rewrite them.
set_conf() {
if [ $# -ne 2 ]; then
echo "set_conf requires two arguments: <key> <value>"
exit 1
fi
if [ -z "$SDC_CONF" ]; then
echo "SDC_CONF is not set."
exit 1
fi
grep -q "^$1" ${SDC_CONF}/sdc.properties && sed 's|^#\?\('"$1"'=\).*|\1'"$2"'|' -i ${SDC_CONF}/sdc.properties || echo -e "\n$1=$2" >> ${SDC_CONF}/sdc.properties
}
# support arbitrary user IDs
# ref: https://docs.openshift.com/container-platform/3.3/creating_images/guidelines.html#openshift-container-platform-specific-guidelines
if ! whoami &> /dev/null; then
if [ -w /etc/passwd ]; then
echo "${SDC_USER:-sdc}:x:$(id -u):0:${SDC_USER:-sdc} user:${HOME}:/sbin/nologin" >> /etc/passwd
fi
fi
# In some environments such as Marathon $HOST and $PORT0 can be used to
# determine the correct external URL to reach SDC.
if [ ! -z "$HOST" ] && [ ! -z "$PORT0" ] && [ -z "$SDC_CONF_SDC_BASE_HTTP_URL" ]; then
export SDC_CONF_SDC_BASE_HTTP_URL="http://${HOST}:${PORT0}"
fi
for e in $(env); do
key=${e%=*}
value=${e#*=}
if [[ $key == SDC_CONF_* ]]; then
lowercase=$(echo $key | tr '[:upper:]' '[:lower:]')
key=$(echo ${lowercase#*sdc_conf_} | sed 's|_|.|g')
set_conf $key $value
fi
done
# MODIFICATIONS:
#exec "${SDC_DIST}/bin/streamsets" "$@"
check_data_collector_status () {
watch -n 1 ${SDC_DIST}/bin/streamsets cli -U http://localhost:18630 ping | grep -q 'version' && echo "Data Collector has started!" && import_pipeline
}
function import_pipeline () {
sleep 1
echo "Start to import pipeline"
${SDC_DIST}/bin/streamsets cli -U http://localhost:18630 -u admin -p admin store import -n myPipeline --stack -f /pipelinejsonlocation/myPipeline.json
echo "Finished importing pipeline"
}
# Start checking if Data Collector is up (in background) and start Data Collector
check_data_collector_status & ${SDC_DIST}/bin/streamsets $@
I commented out the last line exec "${SDC_DIST}/bin/streamsets" "$@"
of the default docker-entrypoint.sh and added two functions. check_data_collector_status ()
pings the Data Collector service until it is available. import_pipeline ()
imports your pipeline.
check_data_collector_status ()
runs in background and ${SDC_DIST}/bin/streamsets $@
is started in foreground as before. So the pipeline is imported after the Data Collector service is started.