Now, I need to find a way in which Python can find the codon position number 5 of the above code and extract that sequence until position 12 (ATGG*CTTTACCTCGTC*TCACAGGAG). So the output should be something like this:
>CCODE1112_5..11
CTTTACCTCGTC
How can I tell python to get the begin value after the first "_" and the end value after ".." so it can do it automatically? ? THANKS!!!
def extractseq( queryseq , begin=5, end =12):
queryseq=queryseq.split('\n')#transform the string in a list of lines included in the string
return queryseq[1][begin-1:end-1]
I think this function should work, beware of the index which begin at 0 in python
after written that in your script you just have to call the function subs=extractseq(seq,5,12)
ok sorry so if you want to extract the 5 and the 12 included in the substring one way to do that easly is:
substring=queryseq.split('\n')[0].split('_')[1].split('...')#extraction of the substring
begin=substring[0]
end = substring[1]