pythonsplitbioinformaticsfastadna-sequence

Create a new variable instance each time I split a string in Python


I have a string into a variable x that includes ">" symbols. I would like to create a new variable each time the string is splitted at the ">" symbol.

The string I have in the variable x is as such (imported from a simple .txt file):

>AF1785813
GTGTGGAGGGAAAGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCGCACCACTGCACTCCA
>AF1785815
GTGTGGAGTGAGCCAAGATCGCACCACTGCACTCCATTCAG
>AF1785814
GTGTGGAGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCGCACCACTGCACTCCA

The expected output is:

print(var_1)

>AF1785813
GTGTGGAGGGAAAGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCGCACCACTGCACTCCA

print(var_2)

>AF1785815
GTGTGGAGTGAGCCAAGATCGCACCACTGCACTCCATTCAG

print(var_3)

>AF1785814
GTGTGGAGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCGCACCACTGCACTCCA

To achieve this I am using a simple for loop

count = 3
for v in range(0, count+1):
    globals()[f"var_{v}"] = x.split('>')
print(var_3)

This way I am successfully getting a new variable for each count (each count is == to the number of ">").

However the output I am currently getting is:

print(var_1)
        
['', 'AF1785813GTGTGGAGGGAAAGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCGCACCACTGCACTCCA', 'AF1785815GTGTGGAGTGAGCCAAGATCGCACCACTGCACTCCATTCAG', 'AF1785814GTGTGGAGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCGCACCACTGCACTCCA']
            
print(var_2)

['', 'AF1785813GTGTGGAGGGAAAGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCGCACCACTGCACTCCA', 'AF1785815GTGTGGAGTGAGCCAAGATCGCACCACTGCACTCCATTCAG', 'AF1785814GTGTGGAGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCGCACCACTGCACTCCA']
            
print(var_3)
        
['', 'AF1785813GTGTGGAGGGAAAGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCGCACCACTGCACTCCA', 'AF1785815GTGTGGAGTGAGCCAAGATCGCACCACTGCACTCCATTCAG', 'AF1785814GTGTGGAGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCGCACCACTGCACTCCA']

How can I troubleshoot the for loop in order to achieve the expected output?


Solution

  • Try to iterate the split result:

    for i, token in enumerate(x.split('>')):
        # do not include empty string
        if token:
            globals()[f"var_{i}"] = token
    
    # then deal with the vars
    print(var_1)
    print(var_2)
    ..