I am developing an ATACseq pipeline using Genrich to run with Snakemake.
The fact is that Genrich allows to call peaks from more than one replicate in the same step, avoiding additional steps (i.e. IDR).
In Snakemake, I have found the way to return all the samples I want (i.e. replicates from one condition) at the same time, but Genrich asks for comma-separated files as input or space-separated files if each file is quoted.
Normally, the input return a list of space-separated files (i.e. file1 file2 file3), and since I don't know how I can make it return comma-separated files, I tried to quote them.
In theory, after Snakemake version 5.8.0, you can refer to the input as {input:q}
in the rule's shell command to return the quoted input, as said here.
However, in my case, the returned input is not quoted.
I have created a test rule to see how the input is returned:
rule genrich_merge_test:
input:
lambda w: expand("{condition}.sorted.bam", condition = SAMPLES.loc[SAMPLES["CONDITION"] == w.condition].NAME),
output:
"{condition}_peaks.narrowPeak",
shell:
"""
echo {input:q} > {output}
"""
And the returned input, which is stored in the output file is:
rep1.sorted.bam rep2.sorted.bam
Does someone know how to solve this and return the quoted input or return a list of comma-separated files instead of space-separated files?
Thank you.
Assuming your input filenames do not contain spaces (and if they do I strongly encourage avoiding them), you can simply put the list of files in quotes, you don't need to quote each file in the list:
rule genrich:
input:
t= ['a.bam', 'b.bam'],
...
shell:
r"""
Genrich -t '{input.t}' ...
"""
(Note single quotes around '{input.t}'
)