pythonsnakemake

Zip input and output without wildcards, using absolute paths


Very similar to Coupling inputs and outputs without shared wildcards, I have assembled a user-specified list of inputs and outputs that are paired:

inputs = ["/path/to/input1", "/path/to/input2"]
outputs = ["/path/to/output1", "/path/to/output2"]

In other words, I have a rule which should process input1 and generate output1, and in parallel the same rule may also process input2 and generate output2. I can try the recommended answer from the linked question, which looks like this, except I have used absolute paths, and remove the ".txt" suffix from the rules:

numbers = ['/tmp/1.txt', '/tmp/2.txt', '/tmp/3.txt', '/tmp/4.txt']
letters = ['/tmp/A.txt', '/tmp/B.txt', '/tmp/C.txt', '/tmp/D.txt']

ln = dict(zip(numbers, letters))

rule all:
    input:
        expand('{number}', number= numbers),

rule out:
    input:
        letter= lambda wc: ln[wc.number],
    output:
    '{number}'
    shell:
        """
        echo {input.letter} > {output}
        """

I have tried touching the inputs /tmp/A.txt, etc., however no matter how I try to arrange it, I get either key exceptions or missing input exceptions. However, if I use relative instead of absolute paths, I can get it to work. Is there any way to get it to work using all absolute paths?


Solution

  • The error occurs because the '{number}' wildcard is too broad. It can match any string, leading to unexpected behavior.

    ...
    Error:
      KeyError: '/tmp/A.txt' ## Notice here.
    Wildcards:
      number=/tmp/A.txt
    ...
    

    To fix this, you can use wildcard_constraints to limit the scope of the number wildcard:

    numbers = ['/tmp/1.txt', '/tmp/2.txt', '/tmp/3.txt', '/tmp/4.txt']
    letters = ['/tmp/A.txt', '/tmp/B.txt', '/tmp/C.txt', '/tmp/D.txt']
    
    ln = dict(zip(numbers, letters))
    
    wildcard_constraints:
        number="|".join(map(re.escape, numbers))
    
    rule all:
        input:
            expand('{number}', number=numbers)
    
    rule out:
        input:
            letter=lambda wc: ln[wc.number]
        output:
            '{number}'
        shell:
            """
            echo {input.letter} > {output}
            """
    

    You can try below code to get a clearer error:

    numbers = ['/tmp/1.txt', '/tmp/2.txt', '/tmp/3.txt', '/tmp/4.txt']
    letters = ['/tmp/A.txt', '/tmp/B.txt', '/tmp/C.txt', '/tmp/D.txt']
    
    ln = dict(zip(numbers, letters))
    
    ln["/tmp/A.txt"] = "/tmp/1.txt"
    
    rule all:
        input:
            expand('{number}', number= numbers),
    
    
    
    def get_input(wc):
        print(wc)
        return ln[f"{wc.number}"]
    
    rule outtes:
        input:
            letter= get_input,
        output:
            '{number}'
        shell:
            """
            echo {input.letter} > {output}
            """