pandasreplaceassignalphabet

assign value/alphabet to a column based on another column with alphabet


i want to assign/replace count column with NAN (zero value) based on sequence or substitute column with alphabet

Pandas dataframe

IDS      sequence                substitute   count 
header1  GCTCAGCTGGCtAGAG          NAN           O
header1  >>>>........<<<<           <            4

expected output

IDS      sequence                substitute   count 
header1  GCTCAGCTGGCtAGAG        NAN           Nan
header1  >>>>........<<<<          <            4

i tried the code given in the below link, but no luck

Assign value to a pandas dataframe column based on string condition

i am not able to change as expected, i get

    ids      sequence         Count count
0  header1  GCTCAGCTGGCtAGAG     0   NaN
1  header1  >>>>.........<<<     0   3

Thank you in advance


Solution

  • Assuming you want to match the rows with a DNA sequence (a/c/g/t letters), you could use str.contains and boolean indexing:

    m = df['sequence'].str.contains('[acgt]', case=False)
    df.loc[m, 'count'] = np.nan
    

    Variant to match any letter:

    m = df['sequence'].str.contains('[a-z]', case=False)
    df.loc[m, 'count'] = np.nan
    

    Or to match rows that do not contain any of >/./<:

    m = ~df['sequence'].str.contains('[>.<]', case=False)
    df.loc[m, 'count'] = np.nan
    

    Output:

           IDS          sequence substitute count
    0  header1  GCTCAGCTGGCtAGAG        NAN   NaN
    1  header1  >>>>........<<<<          <     4