Given an input, I have a cheap function and an expensive function; each of these is modeled as a Concourse task.
If two invocations of the cheap function have the same output, I know that two invocations of the expensive function will likewise have the same output.
How can I set up a pipeline that only runs the expensive function when the result of the cheap function changes?
For the sake of an example, let's say that the cheap function strips comments and whitespace from a codebase and then calculates a checksum; whereas the expensive function actually runs the code contained. My goal, in this scenario, is to not bother building any revision that differs from the prior one only in comments or whitespace.
I've considered using a git resource and (in our example) storing a hash of preprocessor output for each compilation target in a different file, so the task doing actual compilation (and applicable unit tests) can trigger on changes to the file with hash of the inputs that went into building that file. Having a separate git resource that maintains historical hashes indefinitely seems like overkill, though. Is there a better approach?
This is similar to Have Concourse only build new docker containers on file diff not on commit, but I'm trying to test whether the result of running a function against a file changes, to trigger only on changes that could modify build results rather than all possible changes. (The proposal described above, creating an intermediary repo with outputs from the cheap function, would effectively be using the answers to that question as one of its components; but I'm hoping there's an option with fewer moving parts).
Consider using put
nested in the try:
modifier:
The cheap
job takes two inputs:
On every commit to code-repo
, the cheap
job reads the last-hash
input, mapped from hash
and compares it to the computation result (in the silly example below, the contents of hash.txt
checked into the root of code-repo
).
If it determines that the hash value from incoming commit differs from the previously recorded hash value, it populates the put
param hash/hash.txt
with the new hash value, which results in a new put to the resource which in turn will trigger the expensive
job.
If no change is detected, the put attempt will fail because the put
param will not exist, but the overall cheap
job will succeed.
resources:
- name: code-repo
type: git
source:
branch: master
private_key: ((key))
uri: git@github.com:myorg/code-repo.git
- name: hash
type: s3
source:
access_key_id: ((aws_access))
secret_access_key: ((aws_secret))
region_name: ((aws_region))
bucket: my-versioned-aws-bucket
versioned_file: hash/hash.txt
jobs:
- name: cheap
plan:
- get: code-repo
trigger: true
- get: hash
- task: check
input_mapping:
last-hash: hash
config:
platform: linux
image_resource:
type: docker-image
source: { repository: alpine }
inputs:
- name: code-repo
- name: last-hash
outputs:
- name: hash
run:
path: /bin/sh
args:
- -c
- |
LAST="$(cat last-hash/hash.txt)"
NEW=$(cat code-repo/hash.txt)
if [ "$LAST" != "$NEW" ]; then
cp code-repo/hash.txt hash/hash.txt
fi
on_success:
try:
put: hash
params:
file: hash/hash.txt
- name: expensive
plan:
- get: hash
trigger: true
passed: [ cheap ]
Note: you must populate the initial state file in s3
with some value, or the cheap
job won't take off.