For a task, I need to implement the logic as in the script below. (Script source). But in the current version the flag 'directory' is only valid for outputs, not inputs.
How can this be done in the current version?
OUTDIR = "first_directory"
SNDDIR = "second_directory"
THRDIR = "third_directory"
def combine(wildcards):
# read the first set of outputs
ck_output = checkpoints.make_some_files.get(**wildcards).output[0]
FIRSTS, = glob_wildcards(os.path.join(ck_output, "{sample}.txt"))
# read the second set of outputs
sn_output = checkpoints.make_more_files.get(**wildcards).output[0]
SECONDS, = glob_wildcards(os.path.join(sn_output, "{smpl}.txt"))
return expand(os.path.join(THRDIR, "{first}.{second}.tsv"), first=FIRSTS, second=SECONDS)
rule all:
input:
combine
checkpoint make_some_files:
output:
directory(OUTDIR)
shell:
"""
mkdir {output};
N=$(((RANDOM%5)+1));
for D in $(seq $N); do
touch {output}/$RANDOM.txt
done
"""
checkpoint make_more_files:
output:
directory(SNDDIR)
shell:
"""
mkdir {output};
N=$(((RANDOM%5)+1));
for D in $(seq $N); do
touch {output}/$RANDOM.txt
done
"""
rule make_third_files:
input:
directory(OUTDIR),
directory(SNDDIR),
output:
os.path.join(THRDIR, "{first}.{second}.tsv")
shell:
"""
touch {output}
"""
Soluton: link
You can still use the paths to the directories in input
directive, just not with the special directory()
flag meant to signal it is okay to delete it at the start of the run & to not detect changes to how a directory is displayed as change necessitating re-running the rules, as discussed in the documentation here.
This is the modified version that worked for me with Snakemake version 8.18.1.
OUTDIR = "first_directory"
SNDDIR = "second_directory"
THRDIR = "third_directory"
def combine(wildcards):
# read the first set of outputs
ck_output = checkpoints.make_some_files.get(**wildcards).output[0]
FIRSTS, = glob_wildcards(os.path.join(ck_output, "{sample}.txt"))
# read the second set of outputs
sn_output = checkpoints.make_more_files.get(**wildcards).output[0]
SECONDS, = glob_wildcards(os.path.join(sn_output, "{smpl}.txt"))
return expand(os.path.join(THRDIR, "{first}.{second}.tsv"), first=FIRSTS, second=SECONDS)
rule all:
input:
OUTDIR,
SNDDIR,
combine
checkpoint make_some_files:
output:
directory(OUTDIR)
shell:
"""
mkdir {output};
N=$(((RANDOM%5)+1));
for D in $(seq $N); do
touch {output}/$RANDOM.txt
done
"""
checkpoint make_more_files:
output:
directory(SNDDIR)
shell:
"""
mkdir {output};
N=$(((RANDOM%5)+1));
for D in $(seq $N); do
touch {output}/$RANDOM.txt
done
"""
rule make_third_files:
input:
OUTDIR,
SNDDIR,
output:
os.path.join(THRDIR, "{first}.{second}.tsv")
shell:
"""
touch {output}
"""
The other change is that I added the initial two output directories to the input
of the main rule.
You can see the set-up for working it out in a Jupyter session and result here.
You can even easily run it yourself without touching your system in a temporary Jupyter session served by MyBinder.org if you go there and follow the guide at the top to launch a session, and then upload that notebook to it and run all the cells.