regexrstringsequence-analysis

Detecting sequencing using regexes


Imagine I have multiple character strings in a list like this:

[[1]]
 [1] "1-FA-1-I2-1-I2-1-I2-1-EX-1-I2-1-I3-1-FA-1-" 
 [2] "-1-I2-1-TR-1-"                              
 [3] "-1-I2-1-FA-1-I3-1-"                         
 [4] "-1-FA-1-FA-1-NR-1-I3-1-I2-1-TR-1-"          
 [5] "-1-I2-1-"                                   
 [6] "-1-I2-1-FA-1-I2-1-"                         
 [7] "-1-I3-1-FA-1-QU-1-"                         
 [8] "-1-I2-1-I2-1-I2-1-NR-1-I2-1-I2-1-NR-1-"     
 [9] "-1-I2-1-"                                   
[10] "-1-NR-1-I3-1-QU-1-I2-1-I3-1-QU-1-NR-1-I2-1-"
[11] "-1-NR-1-QU-1-QU-1-I2-1-"

I want to use a regex to detect the particular strings where a certain substring precedes another substring, but not necessarily directly preceding the other substring.

For example, let's say that we are looking for FA preceding EX. This would need to match 1 in the list. Even though FA has -1-I2-1-I2-1-I2-1- between itself and EX, the FA still occurs before the EX, hence a match is expected.

How can a generic regex be defined that identifies strings where certain substrings appear before another substring in this manner?


Solution

  • You may use grep.

    x <- c("1-FA-1-I2-1-I2-1-I2-1-EX-1-I2-1-I3-1-FA-1-" ,"-1-I2-1-TR-1-")
    grepl("FA.*EX", x)
    #[1]  TRUE FALSE
    grep("FA.*EX", x)
    #[1] 1