pythonsnakemake

Snakemake Stripping Trailing Slash From Input


I have a workflow which performs a number of QC tasks on the content of a directory. Therefore, this directory is the only input and I invoke as follows:

snakemake --snakefile Snakefile \
  --use-conda \
  --config run_dir=/path/to/my/data/ \
  --slurm

I immediately turn this into a variable and have a function to sanitise it. If the run_dir item is given without a trailing slash, that breaks some logic in the workflow. I therefore have a simple function to add it if it is not provided (since I expect people other than me to use this workflow):

def sanitise_run_dir(RUN_DIR):
    if (RUN_DIR[-1] != '/'):
        RUN_DIR = RUN_DIR + "/"
        return RUN_DIR
    else:
        return RUN_DIR

RUN_DIR=config['run_dir']
RUN_DIR=sanitise_run_dir(RUN_DIR)

I've tried forcing it to output the value of this variable and indeed, it includes the trailing slash as expected.

One of my rules then takes this directory path as input to a custom Python script:

rule gather_metrics:
    input:
        INSTRUMENT,
        RUN_DIR
    output:
        get_library_metrics(),
        get_projected_metrics()
    script:
        "scripts/parse_key_metrics.py"

So my nicely sanitised RUN_DIR variable should be being given as input.

However, I get an error from Snakemake. As you can see, it is stripping the trailing slash before passing RUN_DIR to my script:

Error in rule gather_metrics:
    message: SLURM-job '422068' failed, SLURM status is: 'FAILED'
    jobid: 3
    input: myInstrument, /path/to/my/data
    output: <omitted_for_SO_post>, <omitted_for_SO_post>
    log: .snakemake/slurm_logs/rule_gather_metrics/422068.log (check log file(s) for error details)

Exiting because a job execution failed. Look above for error message

This subsequently causes my script to fail, since it uses str.split('/') to begin parsing the directory name.

The workflow and script used to work fine. Since then, I have been trying to get it to work with Slurm, so that is one change.

I also switched Snakemake version from 5.26.1 to 7.32.4 in order to be able to use Slurm.

In truth, I intend to simply squash the bug by re-sanitising the input in the script parse_key_metrics.py. However, I would prefer to understand why it's happening.

Why does Snakemake exhibit this slash-stripping behaviour and can it be suppressed?

Am I doing something which violates the spirit of Snakemake by passing a directory path as input?


Solution

  • My understanding is that it is deliberate and cannot be suppressed. For efficiency when building the DAG, newer Snakemake builds a cache of information on the input and output files. For the cache to work properly, it needs consistent keys, and so all paths are canonicalised. There are probably also other reasons to do with remote file providers etc.

    Given this simple Snakefile:

    rule rewl:
        input:  "./foo//bar//"
        params: "./foo//bar//"
        shell:
            "echo {input} {params}"
    

    Snakemake complains about the ./ and the //, and then silently discards the trailing slashes in the {input}. However the {params} can be whatever you like and all the slashes are preserved.

    Having a directory as input to a rule is fine, and fully supported, but you should not put "/" on the end of the path. If you want to fix things without extra sanitising in your script you could pass RUN_DIR and/or INSTRUMENT as params.