bioinformaticssnakemakesamtoolsbam

How to write rules for software with version specific shell calls? (E.g. Samtools 1.3.1 and 0.1.18)


Thoughts on how to account for a pipeline which can use versions of a software which requires a (slightly) different shell call?

At times, switching between versions with conda, the shell calls are different, E.g. Samtools 0.1.18 and Samtools 1.3.1. require different formatting for a prefix.

I've thought of three ways, and am looking for additional suggestions:

  1. I'll add a configuration variable in my YAML, use it to set the version number. Have a conditional such that only the matched version-rule is included. Similarly to the way I used a software choice flag here.

  2. I'll do the version detection within the rule, and change the shell call accordingly. In the example above the only difference is the addition of a '-T' argument. This is simple, but I fear eventually I'll arrive at a situation where it is more than just an additional argument.

  3. Write completely separate rules, with the version in the rule name, putting the onus on the user to pick the correct version. This will inevitable cause ambiguity conflicts, but I can handle those.

Reflection: I'm not convinced nesting conditionals is the best approach, as it makes maintenance more difficult, and it's really not that elegant. Having the detection in the rule isn't that bad, but I don't like the idea of the control flow being pushed into rules (right now it's all been removed from rules for my pipeline). I want to avoid creating ambiguity conflicts.

Am I missing supporting functionality? At first I thought it was this Snakemake's 'Version' directive, but it's not actually what I want. Either that or I'm missing how I am to utilize it.

Thoughts?

Supporting Documentation

Samtools 1.3.1

 (CentOS5-Compatible) [tboyarski@login3 bin]$ ./samtools sort
 Usage: samtools sort [options...] [in.bam]
 Options:
   -l INT     Set compression level, from 0 (uncompressed) to 9 (best)
   -m INT     Set maximum memory per thread; suffix K/M/G recognized [768M]
   -n         Sort by read name
   -o FILE    Write final output to FILE rather than standard output
   -T PREFIX  Write temporary files to PREFIX.nnnn.bam
   -@, --threads INT
              Set number of sorting and compression threads [1]
       --input-fmt-option OPT[=VAL]
                Specify a single input file format option in the form
                of OPTION or OPTION=VALUE
   -O, --output-fmt FORMAT[,OPT[=VAL]]...
                Specify output format (SAM, BAM, CRAM)
       --output-fmt-option OPT[=VAL]
                Specify a single output file format option in the form
                of OPTION or OPTION=VALUE
       --reference FILE
                Reference sequence FASTA FILE [null]

1.3.1 ---> config["samtools"] + ' sort -no -m ' + str(config["sortMem"]) + ' - -T' + str(log.stdErr)

Samtools 0.1.18

(CentOS5-Compatible) [tboyarski@login3 condaENV]$ samtools sort
Usage: samtools sort [-on] [-m <maxMem>] <in.bam> <out.prefix>

0.1.18 --> config["samtools"] + ' sort -no -m ' + str(config["sortMem"]) + ' - ' + str(log.stdErr)

The difference is the '-T' just before str(log.stdErr).


Solution

  • I'll go for option 2, using the tool version as a wildcard: You will have it in your output files path, which reinforces documentation, and will avoid trouble if you want to have the results corresponding to different versions of your tool.

    You can use a function of wildcards as params to determine part or even all of the command to pass to shell. If you don't generate the whole command, you can also use a function in params to select the correct tool.

    TOOL_VERSIONS = config["tool_versions"]
    
    
    rule all:
        expand("tool_results/{tool_version}/tool_output", tool_version=TOOL_VERSIONS)
    
    
    rule previous_rule: # I skip this part
    
    
    def version2tool(wildcards):
        if float(wildcards.tool_version) <= 0.1.18:
            return "/usr/bin/tool"
        else:
            return "/usr/local/bin/tool"
    
    
    def tool_version2options(wildcards):
        if float(wildcards.tool_version) <= 0.1.18:
            return "-"
        else:
            return "--T"
    
    
    rule apply_tool:
        input:
            rules.previous_rule.output
        output:
            "tool_results/{tool_version}/tool_output"
        params:
            tool = version2tool,
            tool_options = tool_version2options
        shell:
            """
            {params.tool} {params.tool_options} {input} > {output}
            """
    

    The above assumes you have the desired versions in the config file:

    tool_versions : ["0.1.18", "1.3.1"]