
R - Identifying only strings ending with A and B in a column

I have a column in a data frame in R that contains sample names. Some names are identical except that they end in A or B at the end, and some samples repeat themselves, like this:

df <- data.frame(Samples = c("S_026A", "S_026B", "S_028A", "S_028B", "S_038A", "S_040_B", "S_026B", "S_38A"))

What I am trying to do is to isolate all sample names that have an A and B at the end and not include the sample names that only have either A or B.

The end result of what I'm looking for would look like this: "S_026" and "S_028" as these are the only ones that have A and B at the end.

All I seem to find is how to remove duplicates, and removing duplicates would only give me "S_026B" and "S_38A" in this case.

Alternatively, I have tried to strip the A's and B's at the end and then sum how many times each of those names sum > 2, but again, this does not give me the desired results.

Any suggestions?


  • We could use substring to get the last character after grouping by substring not including the last character, and check if there are both 'A', and 'B' in the substring

    df %>% 
       group_by(grp = substr(Samples, 1, nchar(Samples)-1)) %>% 
       filter(all(c("A", "B") %in% substring(Samples, nchar(Samples)))) %>% 
       ungroup %>% 


    # A tibble: 5 x 1
    1 S_026A 
    2 S_026B 
    3 S_028A 
    4 S_028B 
    5 S_026B