#Make 1 library.csv for a pair of ATAC and RNA seq accession numbers
def read_rna_accession():
with open('rna_accessions.txt') as f:
samples = [sample for sample in f.read().split('\n') if len(sample) > 0] # Remove empty lines
return samples
def read_atac_accession():
with open('atac_accessions.txt') as f:
samples = [sample for sample in f.read().split('\n') if len(sample) > 0] # Remove empty lines
return samples
# Read ATAC and RNA accession IDs
atac_SRRs = read_atac_accession()
rna_SRRs = read_rna_accession()
# Define all rule for generating libraries.csv
rule all:
input:
expand("{atac_srr}_{rna_srr}_libraries.csv", atac_srr=atac_SRRs, rna_srr=rna_SRRs)
# Rule to create libraries.csv
rule create_libraries_csv:
output:
"{atac_srr}_{rna_srr}_libraries.csv"
run:
atac_srr = wildcards.atac_srr
rna_srr = wildcards.rna_srr
with open(output[0], "w") as f:
f.write("fastqs,sample,library_type\n")
f.write(f"atac_seq/{atac_srr},{atac_srr},Chromatin Accessibility\n")
f.write(f"rna_seq/{rna_srr},{rna_srr},Gene Expression\n")
Hello, I am trying to make a libraries.csv file for a pair of atac_seq and rna_seq accession numbers {SRRXXXXXX}. For now, I have 2 SRR for atac_seq, and 2 SRR for rna_seq; I want to generate a libraries.csv file for each pair, so I should end up with 2 {atac_srr}_{rna_srr}libraries.csv file.
With the code that I have, it generates it for all the possible combinations between the atac_seq and rna_seq accession numbers; I would like to generate it sequentially, meaning that the first SRR in the atac acession numbers list corresponds to the first element in the SRR in the rna accession numbers list; is there a possible way to do this?
you could do it as a for loop as sugested by Swifty. or if you want to keep the expand function:
expand("{atac_srr}_{rna_srr}_libraries.csv",zip, atac_srr=atac_SRRs, rna_srr=rna_SRRs)