concourseluigi

Luigi does not send error codes to concourse ci


I have a test pipeline on concourse with one job that runs a set of luigi tasks. My problem is: failures in the luigi tasks do not rise up to the concourse job. In other words, if a luigi task fails, concourse will not register that failure and states that the concourse job completed successfully. I will first post the code I am running, then the solutions I have tried.

luigi-tasks.py

class Pipeline1(luigi.WrapperTask):
    def requires(self):
        yield Task1()
        yield Task2()
        yield Task3()

tasks.py

class Task1(luigi.Task):
    def requires(self):
        return None

    def output(self):
        return luigi.LocalTarget('stuff/task1.csv')

    def run(self):
        #uncomment line below to generate task failure
        #assert(True==False)
        print('task 1 complete...')
        t = pd.DataFrame()
        with self.output().open('w') as outtie:
            outtie.write('complete')

# Tasks 2 and 3 are duplicates of this, but with 1s replaced with 2s or 3s.

config file

[retcode]
# codes are in increasing level of severity (for most applications)
already_running=10
missing_data=20
not_run=25
task_failed=30
scheduling_error=35
unhandled_exception=40

begin.sh

#!/bin/sh
set -e
export PYTHONPATH='.' 
luigi --module luigi-tasks Pipeline1 --local-scheduler
echo $?

pipeline.yml

# <resources, resource types, and docker image build job defined here>

#job of interest
- name: run-docker-image
  plan:
  - get: timer
    trigger: true
  - get: docker-image-ecr
    passed: [build-docker-image]
  - get: run-git
  - task: run-script
    image: docker-image-ecr
    config:
      inputs:
      - name: run-git
      platform: linux
      run:
        dir: ./run-git
        path: /bin/bash 
        args: ["begin.sh"]

I've introduced errors in a few ways: assertions/raising an exception (ValueError) within an individual task's run() method and within the wrapper, and sys.exit(luigi.retcodes.retcode().unhandled_exception). I also tried failing all tasks. I did this in case the error needed to be generated in a specific manner/location. Though they all produced a failed task, none of them produced an error in the concourse server.

At first, I thought concourse just gives a success if it can run the file or command tasked to it. I'm not sure it's that simple, though. Interestingly, when I run the pipeline on my local computer (luigi --modules luigi-tasks Pipeline1 --local-scheduler) I get an appropriate return code (e.g. 30), but when I run the pipeline within the concourse server, I get a return code of 0 after the luigi tasks complete (from echo $? in the bash script).

Would appreciate any insight into this problem.


Solution

  • My suspicion is that luigi doesn't see your config file with return codes. Its default behavior is to return 0, whether tasks fail or succeed.

    This experiment should help to debug that:

    1. Force a failed job: add an exit 1 at the end of begin.sh
    2. Hijack the job: fly -t <target> i -j <pipeline>/<job> -> select run-script
    3. cd ./run-git; /bin/bash begin.sh
    4. Ensure the luigi config is present and named appropriately, e.g. luigi.cfg
    5. Re-run the command: LUIGI_CONFIG_PATH=luigi.cfg bash ./begin.sh
    6. Check output: echo $?