Here is my basic issue:
I have the following: file name: parseFastq.py execution: via command line code to run it: python3 parseFastq.py --fastq /Users/remaining_dir/test1.fastq
This code works!!!
However, when I copy the components of parseFastq.py issues arise.
Below is the code:
Class is first defined...this part works and runs fine on my new script.
import argparse
import gzip
#Example use is
# python parseFastq.py --fastq /Users/remaining_dir/test1.fastq
################################################
# You can use this code and put it in your own script
class ParseFastQ(object):
"""Returns a read-by-read fastQ parser analogous to file.readline()"""
def __init__(self,filePath,headerSymbols=['@','+']):
"""Returns a read-by-read fastQ parser analogous to file.readline().
Exmpl: parser.__next__()
-OR-
Its an iterator so you can do:
for rec in parser:
... do something with rec ...
rec is tuple: (seqHeader,seqStr,qualHeader,qualStr)
"""
if filePath.endswith('.gz'):
self._file = gzip.open(filePath)
else:
self._file = open(filePath, 'rU')
self._currentLineNumber = 0
self._hdSyms = headerSymbols
def __iter__(self):
return self
def __next__(self):
"""Reads in next element, parses, and does minimal verification.
Returns: tuple: (seqHeader,seqStr,qualHeader,qualStr)"""
# ++++ Get Next Four Lines ++++
elemList = []
for i in range(4):
line = self._file.readline()
self._currentLineNumber += 1 ## increment file position
if line:
elemList.append(line.strip('\n'))
else:
elemList.append(None)
# ++++ Check Lines For Expected Form ++++
trues = [bool(x) for x in elemList].count(True)
nones = elemList.count(None)
# -- Check for acceptable end of file --
if nones == 4:
raise StopIteration
# -- Make sure we got 4 full lines of data --
assert trues == 4,\
"** ERROR: It looks like I encountered a premature EOF or empty line.\n\
Please check FastQ file near line number %s (plus or minus ~4 lines) and try again**" % (self._currentLineNumber)
# -- Make sure we are in the correct "register" --
assert elemList[0].startswith(self._hdSyms[0]),\
"** ERROR: The 1st line in fastq element does not start with '%s'.\n\
Please check FastQ file near line number %s (plus or minus ~4 lines) and try again**" % (self._hdSyms[0],self._currentLineNumber)
assert elemList[2].startswith(self._hdSyms[1]),\
"** ERROR: The 3rd line in fastq element does not start with '%s'.\n\
Please check FastQ file near line number %s (plus or minus ~4 lines) and try again**" % (self._hdSyms[1],self._currentLineNumber)
# -- Make sure the seq line and qual line have equal lengths --
assert len(elemList[1]) == len(elemList[3]), "** ERROR: The length of Sequence data and Quality data of the last record aren't equal.\n\
Please check FastQ file near line number %s (plus or minus ~4 lines) and try again**" % (self._currentLineNumber)
# ++++ Return fatsQ data as tuple ++++
return tuple(elemList)
##########################################################################
This is the code that will not work when calling it in the same script; it has to do with putting the pieces in :
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Process fasq files and seperaate into 4 categories')
parser.add_argument("-f", "--fastq", required=True, help="Place fastq inside here")
args = parser.parse_args()
fastqfile = ParseFastQ(args.fastq)
I tried the following and I cannot get fastqfile which should contain a tuple with the following: (seqHeader,seqStr,qualHeader,qualStr)
Attemp:
parser.add_argument("-/Users/remaining_dir/test1.fastq", "--fastq", required=True, help="Place fastq inside here")
Error:
argument -/Users/remaining_dir/test1.fastq/--fastq: conflicting option string: --fastq
Attemp:
parser.add_argument("-/Users/remaining_dir/test1.fastq", "-@", required=True, help="Place fastq inside here")
Out[332]:
_StoreAction(option_strings=['-/Users/remaining_dir/test1.fastq', '-@'], dest='/Users/remaining_dir/test1.fastq', nargs=None, const=None, default=None, type=None, choices=None, help='Place fastq inside here', metavar=None)
next line:
Error:
usage: [-h] -/Users/remaining_dir/test1.fastq
/USERS/REMAINING_DIR/TEST1.FASTQ
: error: the following arguments are required: -/Users/remaining_dir/test1.fastq/-@
An exception has occurred, use %tb to see the full traceback.
SystemExit: 2
when %tb selected the following info was give:
File "/Users/brownbear/opt/anaconda3/lib/python3.7/argparse.py", line 2508, in error
self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
File "/Users/brownbear/opt/anaconda3/lib/python3.7/argparse.py", line 2495, in exit
_sys.exit(status)
if helpful, I am including some sample fastq data
@seq13534-419
GCAGTAGCGGTCATAAGTGGTACATTACGAGATTCGGAGTACCATAGATTCGCATGAATCCCTGTGGATACGAGAGTGTGAGATATATGTACGCCAATCCAGTGTGATACCCATGAGATTTAGGACCGATGATGGTTGAGGACCAAGGATTGACCCGATGGATGCAGATTTGACCCCAGATAGAATAAATGCGATGAGATGATTTGGCCGATAGATAGATAGTGTCGTGAGGTGACGTCCGTCACTGGACGAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFDFFDFFDDFDFDFFFFDDFFDDFDDFF
@seq86249-867
GGATTAGCGGTCATAAGTCGTACATTACGAGATTCGGAGTACCATAGATTCGCATGAATCCCTGTGGATACGAGAGTGTGAGATATATGTACGCCAATCCAGTGTGATACCCATGAGATTTAGGACCGATGATGGTTGAGGACCAAGGATTGACCCGATGGATGCAGATTTGACCCCAGATAGAATAAATGCGATGAGATGATTTGGCCGATAGATAGATAGAGGTCAGTATAACCTCTCAAAGCTTTATCTACGGATGGATCCGCGC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDFDDDDDDFFDFDDFDDDFDFFDDFFFFFFFFFDDFDFFDDFDDF
@seq46647-928
GACCTAGCGGTCATAAGTGGTACATTACGAGATTCGGAGTACCATAGATTCGCATGAATCCCTGTGGATACGAGAGTGTGAGATATATGTACGCCAATCCAGTGTGATACCCATGAGATTTAGGACCGATGATGGTTGACGACCAAGGATTGACCCGATGGATGCAGATTTGACCCCAGATAGAATAAATGCGATGAGATGATTTGGCCGATAGATAGATAGTAAGTAAATGCCACGGACTCGTCACGTG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDDFDFDFFFFFDFFDFDFDDDDDFDFF
Any help would be appreciated on why this works when I run the script but now when I try and incorporate within a script
the solution was two main parts
I was trying to run the argparse via an IDE (Spyder), and running only selected code as opposed to the whole scripts.
For those who are new to python and are utilizing argparse for the first time... this tool only works when calling from the command line.
Therefore, once you've created your args table
you will run as belowL
from command line:
python3 parseFastq.py --fastq test1.fastq
To break this down further from the initial set up, you are basically labeling your test1.fastq file, to the tag --fastq... this is critical, if you get error that it is required in a particular format is that you have to add them in pairs... in this particular example, you can also label with the short hand of "-f". Therefore, it could also be run as...
from command line:
python3 parseFastq.py -f test1.fastq
as long as you're py script is run in the same directory as your called files, you do not need the full extension.