Snakemake How to design downstream rules on empty output files?

I am wondering how to deal with empty output files in downstream rules. An assembly of a short read fastq data with SHOVILL can fail and produce a 0-byte contigs.fa

If genome annotation with PROKKA is run on a 0-byte file it returns an error:

 'input.fasta' is not a readable non-empty FASTA file

Snakemake rules for assembly and annotation :

rule shovill:
    input:
        fw="input/{fastq}_R1.fastq.gz",
        rv="input/{fastq}_R2.fastq.gz"
    output:
        directory("results/shovill/{sample}")
    threads: 16
    shell:
        "shovill --cpus {threads} --R1 {input.fw} --R2 {input.rv} --outdir {output}\
            2> {log}"

My current solution is to use || true and generate the result directory preliminary.

rule prokka:
    input:
        "results/shovill/{sample}/contigs.fa"
    output:
        directory("results/prokka/{sample}")
    threads: 8
    params:
        prefix="{sample}",
        gcode=config["genetic_code"],
        outdir="results/prokka/{sample}"
    shell:
        """
        mkdir -p {params.outdir}
        prokka --cpus {threads}  --force {input} &> {log} || true
        """

Solution

I can think of two approaches, neither is perfect.

The first is basically what you did: use bash to work around. However, I would suggest the -s file test operator. This way, you still get notified of a genuine error from prokka:

shell:
    """
    if [ -s {input} ]; then
       prokka --cpus {threads}  --force {input} > {log} 
    else;
       mkdir -p {params.outdir}
    """

Alternatively, use checkpoints. This puts all the logic in snakemake, but it's a little cumbersome. Something like this, but I'm not sure if this is 100% right, or the best version:

checkpoint gather_outputs:
    input: expand("results/shovill/{sample}/contigs.fa", sample=samples)
    output: 'results/shovill/non_empty.list'
    shell: 
        with open(str(output), 'wt') as output_handle:
            for input_file in input:
                if os.path.exists(input_file) and os.path.getsize(input_file):
                    sample = os.path.basename(os.path.dirname(input_file))
                    output_handle.write(f"{sample}\n")

def get_prokka_outputs(wildcards):
    # this makes the rest of the workflow wait for the checkpoint
    chk_out = checkpoints.gather_outputs.get().output
    with open(chk_out) as sample_lines:
        samples = [line.strip() for line in sample_lines]

    return expand("results/prokka/{sample}", sample=samples)

You might be able to do a per-sample checkpoint, but I've never tried it.

The more I think about it, checkpoints are too much overhead for your situation.