Snakemake version: 8.16.0
rule test:
input:
"input.txt"
output:
"output.txt"
shell:
"""
cat {input} > {output}
"""
Size of input.txt
is less than 100,000 bytes
(e.g. 99,999 bytes
)
snakemake -c1
, everything is ok, the output.txt
will be created.touch input.txt
snakemake -c1
again, snakemake will prompt Nothing to be done (all requested files are present and up to date).
(This is not the result I expected.)input.txt
, let its size equals to(or more than) 100,000 bytes
snakemake -c1
, rule test
will run again, because the input file has been changed.touch input.txt
. (Same as step 2)snakemake -c1
again, rule test
will run again, because modification date of the input file has been changed.Why won't snakemake re-run workflow if I touch the input file with size less than 100,000 bytes? Are there some ways to let snakemake re-run, as long as I touch any input file?
I have try above steps in 2 devices and get the same results.
The behaviour is coded in the Snakemake source here:
Note that the size cutoff is hard-coded and cannot be turned off. You might think that running Snakemake with the option --rerun-triggers mtime
would ignore the file checksum, but it does not.
I think there was some discussion about this within one of the many many open Snakemake bugs, but I don't have the link to hand. At the very least the behaviour should be properly documented.
There is a workaround that may be useful for you. Run Snakemake with the --drop-metadata
option so that the checksums will not be recorded. Changes to the code will also be untracked. This is basically the same as deleting the '.snakemake' directory between runs, but you'll not lose conda environments, locks, etc. I've put this in my own default profile as the checksumming logic was causing me problems too. Possibly this will cause problems with incomplete jobs not being detected, but I tend to use shadow rules for anything where that is a concern.