pythonpipelineworkflowsnakemake

Snakemake - Call multiple scripts in a rule


I have a snakemake rule where I call a python script with the script keyword:

rule merge_results:
    input:
        [...]
    output:
        path_results
    script:
        "merge_results.py"

However, I would like to also call an R script in this rule, depending on a condition. I wanted to do something like this:

rule merge_results:
    input:
        [...]
    output:
        path_results
    script:
        "merge_results.py"
        if cond:
            "my_script.R"

but Snakemake won't allow it. I know I could call my scripts by using the run and shell() keywords, but my python and R scripts use a lot of times the snakemake.input variable, so I would have to change a lot of things in their code in order to call them in a different way. Do you know if I can avoid that and use the script: keyword with multiples files ?


Solution

  • It is not possible to run multiple scripts using the script directive.

    I think you should reconsider your approach. Do these scripts depend on each other? As in, does one need to run before the other? Do they write to the same output file, or do they generate separate results? Would it be possible to split this into two separate rules and define dependencies between them? You could use a 'flag file' for this, i.e. let one rule create an empty output file that is marked as temp() that is then used as input for the other rule. Or if the scripts do in fact produce different output files, you can use a target rule that aggregrates these files by listing them in the input directive.

    If you are able to split this into two rules, you could use the param directive to compute the condition for the if statement, and then within my_script.R check on I believe snakemake@config['myParam'] to determine if you want to run the logic in the script or not. However, since Snakemake does expect to find an output file from the rule after execution completes, you would have to create an empty file when the if condition evaluated to false.

    If you are unable to split this into multiple rules, one way to go about this would be to use the run directive in combination with shell(), and providing input and output as command line arguments:

    rule merge_results:
        input:
            [...]
        output:
            path_results
        run:
            shell("python3 merge_results.py -i {input} -o {output}")
            if cond:
                shell("Rscript my_script.R -i {input} -o {output}")
    

    To make this work in python scripts you will need argparse, or if you are fine with less flexibility then you could ommit the -i and -o flags and use sys.argv to get the arguments instead. For R scripts, you can use optparse to get the input using flags, or commandArgs for use without flags. Either way, you will have to make edits to your scripts to remove the snakemake.input references and replace them with the arguments provided from the command line.