rstringn-gram

Split a string in consecutive substrings of size n in R in an efficient way


# Input
n <- 2
"abcd" 
# Output
c("ab", "bc", "cd")

I don't want to use a for loop or sapply


Solution

  • You may use substring -

    get_n_grams <- function(string, n) {
      len <- nchar(string)
      substring(string, seq_len(len - n + 1), n:len)    
    }
    
    get_n_grams("abcd", 2)
    #[1] "ab" "bc" "cd"
    
    get_n_grams("abcd", 3)
    #[1] "abc" "bcd"