rstringtexttext-miningsentiment-analysis

Deleting certain segments of strings in R?


In R, suppose I have the following string:

x<-"The bank is going after bank one and pizza and corn."

I would like to delete all segments of the string before the FINAL time bank appears in the sentence, thus obtaining the string "one and pizza and corn." More generally, if I want to delete all text before the final time a specific word appears in a string, if there a way to do this?


Solution

  • You can use group cature regex as shown below:

    words <- "bank"
    pat <- sprintf("^.*?(\\b%s\\b).*\\1 ?", paste0(words, collapse = "|"))
    sub(pat, "", x, perl = TRUE)
    

    Explanation ^ from begining the sentence, match .*? anything 0 or many times lazily until the first appearance of the bounded word. From here match greedily everything until you meet the word the last time. Replace everything with a empty string ''