pythoncatsnakemakewildcard-expansion

Using expand to concatenate .tab files in subdirectories which are variables themselves


I have two variables and I would like to concatenate all .tab files for "BOB" and "LISA" separately. How to do this in snakemake? With expand like below it is concatenating all tab files for both BOB and LISA together.

GENES=["BOB","LISA"]
SAMPLES=["FB_399","FB_400"]

rule all:
    input:
        expand("/path/to/{gene}/ALL_final.tab", gene=GENES)
   
...some other code here which produces tab file"

rule cat:
    input:
         expand("/path/to/{gene}/{sample}.annotation.tab", sample=SAMPLES, gene=GENES)
    output:
         temp("/path/to/{gene}/all.tab"),
         "/path/to/{gene}/ALL_final.tab"
    shell:
        """
        awk 'FNR > 1 {{print FILENAME "\t" $0}}' {input[0]} > {output[0]}
        sed -i 's/.annotation.tab//g' {output[0]}
        cat header.txt {output[0]} > {output[1]}
        """

Solution

  • To keep specific wildcards as variables during expansion, embrace them in double curly brackets {{like_so}}. For example, expand('{a}_{{b}}', a=[1,2]) will generate ['1_{b}', '2_{b}].

    In your specific case, the following should work:

    rule cat:
        input:
             expand("/path/to/{{gene}}/{sample}.annotation.tab", sample=SAMPLES)
        output:
             temp("/path/to/{gene}/all.tab"),
             "/path/to/{gene}/ALL_final.tab"
        shell:
            """
            awk 'FNR > 1 {{print FILENAME "\t" $0}}' {input[1]} > {output[0]}
            sed -i 's/.annotation.tab//g' {output[0]}
            cat header.txt {output[0]} > {output[1]}
            """