pythondatabasestringgraphfastq

Explanation of a code about lineIndex , to collect reads from a file


Here the aim is to build a graph from a collection of stings (reads) in a FASTQ file. But first, we implement the following function that gets the reads. We remove the new line character from the end of each line (with str.strip()), and for convention, we convert all characters in the reads to uppper case (with str.upper()). The code for that:

def get_reads(filePath):
    reads = list() # The list of strings that will store the reads (the DNA strings) in the FASTQ file at filePath
    fastqFile = open(filePath, 'r') 
    fastqLines = fastqFile.readlines() 
    fastqFile.close()

    for lineIndex in range(1, len(fastqLines), 4): # I want this explained
        line = fastqLines[lineIndex]
        reads.append(line.strip().upper())
        
    return reads

My question is: Explain what is the purpose of the line for lineIndex in range(1, len(fastqLines), 4)?


Solution

  • fastqLines is a Python List of each line read from the file. The loop from

    for lineIndex in range(1, len(fastqLines), 4):
    

    produces a value of lineIndex of 1, 5, 9 ... to the size of the List. This value is then used to store the selected lines in another List reads. Because Python Lists are indexed from 0, this all means that the 2nd, 6th, 10th lines from the file are stored in reads