i want to assign/replace count column with NAN (zero value) based on sequence or substitute column with alphabet
Pandas dataframe
IDS sequence substitute count
header1 GCTCAGCTGGCtAGAG NAN O
header1 >>>>........<<<< < 4
expected output
IDS sequence substitute count
header1 GCTCAGCTGGCtAGAG NAN Nan
header1 >>>>........<<<< < 4
i tried the code given in the below link, but no luck
Assign value to a pandas dataframe column based on string condition
i am not able to change as expected, i get
ids sequence Count count
0 header1 GCTCAGCTGGCtAGAG 0 NaN
1 header1 >>>>.........<<< 0 3
Thank you in advance
Assuming you want to match the rows with a DNA sequence (a/c/g/t letters), you could use str.contains
and boolean indexing:
m = df['sequence'].str.contains('[acgt]', case=False)
df.loc[m, 'count'] = np.nan
Variant to match any letter:
m = df['sequence'].str.contains('[a-z]', case=False)
df.loc[m, 'count'] = np.nan
Or to match rows that do not contain any of >
/.
/<
:
m = ~df['sequence'].str.contains('[>.<]', case=False)
df.loc[m, 'count'] = np.nan
Output:
IDS sequence substitute count
0 header1 GCTCAGCTGGCtAGAG NAN NaN
1 header1 >>>>........<<<< < 4