bashapache-kafkadocker-composecircleci

grep over docker compose logs returns 255 exit code unexpectedly in until loop on CircleCI


For my application, I am using kafka, and I wanted a way to make sure that I was properly awaiting it being ready before trying to run the application e2e tests in our CI test suite. To accomplish this, I used the until bash keyword and looked for a line in the logs to indicate it was ready, started (kafka.server.KafkaServer). The full setup is:

until docker compose logs kafka 2>/dev/null | grep -q "started (kafka.server.KafkaServer)"; do
  sleep 1
done

I am finding that if this loop runs before kafka has been fully started and is ready, then the until loops as expected until kafka is properly ready. However, if this loop runs after the docker containers are started, then the loop never finishes, and from my tests, I get a consistent 255 error code from it. However, if I add the same grep line within my loop, I do see that the line is consistently found and I get $? = 0, so I'm not sure where lies the problem.

For reference, here's the full CircleCI commands I'm running that's hitting the problem:

      - run:
          name: 'Start background services'
          working_directory: services
          background: true
          command: docker-compose up
      - run:
          name: 'Build images for app'
          working_directory: app
          command: docker-compose build
      - run:
          name: 'Wait for services'
          working_directory: services
          command: |
            until docker compose logs kafka | grep -q "started (kafka.server.KafkaServer)"; do
              sleep 1
            done
            echo "Kafka is ready"

I get an error in the until loop never finishing if the build images step takes >= 1min, while the loop works fine if the images are fully cached.


Solution

  • I ended up with the following code to accomplish waiting for the kafka service in CircleCI:

              command: |
                timeout=120
                counter=0
                until docker compose logs kafka 2>/dev/null | grep "started (kafka.server.KafkaServer)" > /dev/null; do
                  counter=$((counter+1))
                  if [ ${counter} -gt ${timeout} ]; then
                    echo "Kafka not ready after ${timeout} seconds"
                    exit 1
                  fi
                  sleep 1
                done
                echo "Kafka is ready - ${counter}"
    

    No idea why grep -q would always fail and I just assume it's something wonky with the CircleCI runner, and didn't dig any deeper.