I'm writing a makefile.
I have an artifact data and I use gpt to process it. Due to token limits, I have to break the data into segments (rule segments), each processed by AI (rule ai), then assemble segments together (rule assemble).
segments :
break --size 20M ./data.txt seg-%d.txt
ai : segments
for f in seg-*; do
# outputfile is ai-output-%d
gpt $f
done
assemble : ai
concat --output final.txt -- ai-output-*
This makefile works when I call make assemble
.
However, gpt is not very stable. Sometimes I have to call it again to get a better output. Then, if I call make assemble
, make will re-run rule ai, and overwrite my preferred output.
I would like to use this makefile in two cases:
make assemble
, and final.txt
is created.make assemble
, and final.txt
is created from my revised output.How can I change my makefile to allow this two cases?
There are several issues with your Makefile:
By default the recipes (the commands of your rules) are executed by the sh
shell, and for sh
, break
is an already defined command that breaks the enclosing loop. Unless you instruct make to use a different shell (by assigning the SHELL
special variable), your first rule will likely fail.
By default each line of a recipe is executed by a different invocation of the shell. Unless you write its recipe on a single line your second rule will likely fail.
Before passing a recipe to the shell make
expands it. So, if you want to use shell variables, like in your second rule, you must protect their shell
expansion from this first make
expansion: use $$f
, not $f
. But as you will see below you don't need a shell for
loop to generate the ai-output-%
from the seg-%.txt
; make
pattern rules have been invented exactly for this kind of situation.
make
needs to know which files are generated by the rules in order to decide if they are up to date or not (by comparing the last modification times of target files and prerequisite files). If you hide this essential information from make
, it cannot do its job properly, and it may rebuild what is up to date or not rebuild what is out of date.
make
runs in two phases. It builds the tree of dependencies during the first phase, and builds what needs to be during the second phase. If the tree of dependencies is modified during the second phase, because files are created/deleted, make
cannot update its plans, it's too late.
To solve all these issues, with GNU make
1, you could try something like:
.PHONY: segments ai assemble clean
SEGS := $(wildcard seg-*.txt)
AIS := $(patsubst seg-%.txt,ai-output-%,$(SEGS))
segments: .segs.done
.segs.done: ./data.txt
rm -f seg-*.txt
my_break --size 20M $< seg-%d.txt
touch $@
ai: $(AIS)
$(AIS): ai-output-%: seg-%.txt
gpt $<
assemble: .segs.done
$(MAKE) final.txt
final.txt: $(AIS)
concat --output $@ -- $^
clean:
rm -f seg-*.txt .segs.done ai-output-* final.txt
Explanations, in the same order as the above list of issues:
We use my_break
instead of break
.
We don't use multi-lines recipes. If we were we would write them on a single line (adding ;
, &&
, ||
, |
, etc. to join them when needed). We would maybe split the line if it is too long and use the line continuation (by adding a \
at the end of each line). Example:
ai: segments
for f in seg-*; do gpt $$f; done
Or:
ai: segments
for f in seg-*; do \
gpt $$f; \
done
But we don't need all this because we use a static pattern rule to tell make
how to build the ai-output-N
file from the seg-N.txt
file:
$(AIS): ai-output-%: seg-%.txt
gpt $<
A side benefit is that make
can launch parallel jobs to build several ai-output-N
files at a time (try make -j12
if you have 12 cores).
We don't use shell variables in recipes. If we did we would use $$
instead of $
(at least), as in the above example.
We use phony targets to offer targets that are synonyms of a group of similar targets (e.g., make ai
to build all ai-output-N
), but we explicitly tell make
what files are produced by each non-phony rule.
There is only one exception: the my_break
rule for which we don't know which files are produced. To solve this problem we use a common trick: generate (or update) a dummy empty file (.seg.done
) with a final touch
to "remember" the last time we ran my_break
. This way, by comparing the last modification times of data.txt
and .seg.done
, make
knows if my_break
should be run again.
Note that the recipe first deletes all existing seg-N.txt
to avoid keeping around out of date files.
Note also that, during the first phase of make
, we compute the list of existing seg-N.txt
files with wildcard
, and store it in make variable SEGS
. We also compute the list of corresponding ai-output-N
files with patsubst
and store it in make variable AIS
. And we use these in the rest to guarantee that we are accurate and don't incorporate extra out of date ai-output-N
files in the building of final.txt
.
This one is probably the most difficult. To solve it we must first update the seg-N.txt
files, just in case data.txt
changed, and then restart make
from scratch such that it discovers the new situation during its first phase. We thus call make
from make
(recursive make
2). When building assemble
, make
first updates the seg-N.txt
files if needed (because .segs.done
is a prerequisite) and then calls itself to finish the job, starting from a state where all seg-N.txt
are present and up to date.
If, between two make assemble
, you run gpt
again to improve some ai-output-N
, make
notices it. It does not rebuild any seg-N.txt
or ai-output-N
because it knows they are newer than their prerequisites, but it rebuilds final.txt
because it is older than some of its prerequisites.
There are some extra features like automatic variables ($<
, $@
, $^
) or the use of $(MAKE)
to call make
. If needed you will find explanations in the GNU make
manual.
As noted in comments you may want to add the .DELETE_ON_ERROR
special target somewhere to automatically delete targets when the recipe that builds them fails. Don't rely too much on it, however, because the my_break
rule does not explicitly list the real seg-N.txt
targets. So, if my_break
fails, only .segs.done
will be deleted.
1 If your make
is not GNU make
there are probably a few things to adapt.
2 You will maybe read here or elsewhere that "recursive make
is considered harmful". Don't let these statements prevent you from using recursive make
when it is absolutely needed. As many other features, recursive make
can be harmful when it is wrongly used, but it is essential in some cases, like yours.