pythonstringlistdna-sequence

Mark positions of a string in a list


I have two lists,

one holds nucleotide values

nucleotides = ['A', 'G', 'C', 'T', 'A', 'G', 'G', 'A', 'G', 'C']

second one holds true(1) or false(0) values for every letter to indicate that they are covered or not.

flag_map = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

initially second one have all false values.

Let's say I want to map every "AGC" values in the array.

So my second list should be like this:

flag_map = [1, 1, 1, 0, 0, 0, 0, 1, 1, 1]

What is the best method that I can use here?


Solution

  • You could join your original list of characters into a single string, and then replace instances of 'AGC' by '111'. Then iterate over each character in this string, and if it's '1', we want a 1 in our result, else we want 0.

    I changed your variable name because str is a built-in.

    nucleotides = ['A', 'G', 'C', 'T', 'A', 'G', 'G', 'A', 'G', 'C']
    
    flag_map = [int(char == '1') 
              for char in "".join(nucleotides).replace('AGC', '111')]
    

    Which gives the desired result:

    [1, 1, 1, 0, 0, 0, 0, 1, 1, 1]
    

    The char == '1' checks if the character is '1', and int(...) converts the boolean True to 1 and False to 0. If you are fine with booleans (for most applications I can see, you should be perfectly fine using bool instead of int) then you can skip the conversion to int and simply have char == '1'.

    Try it online