rrle

Having difficulty using rle command within a mutate step in r to count the max number of consecutive characters in a word


I created this function to count the maximum number of consecutive characters in a word.

max(rle(unlist(strsplit("happy", split = "")))$lengths)

The function works on individual words, but when I try to use the function within a mutate step it doesn't work. Here is the code that involves the mutate step.

text3 <- "The most pressing of those issues, considering the franchise's 
stated goal of competing for championships above all else, is an apparent 
disconnect between Lakers vice president of basketball operations and general manager"

text3_df <- tibble(line = 1:1, text3)

text3_df %>%
  unnest_tokens(word, text3) %>% 
  mutate(
    num_letters = nchar(word),
    num_vowels = get_count(word),
    num_consec_char = max(rle(unlist(strsplit(word, split = "")))$lengths)
  )

The variables num_letters and num_vowels work fine, but I get a 2 for every value of num_consec_char. I can't figure out what I'm doing wrong.


Solution

  • This command rle(unlist(strsplit(word, split = "")))$lengths is not vectorized and thus is operating on the entire list of words for each row thus the same result for each row.

    You will need to use some type of loop (ie for, apply, purrr::map) to solve it.

    library(dplyr)
    library(tidytext)
    
    text3 <- "The most pressing of those issues, considering the franchise's 
    stated goal of competing for championships above all else, is an apparent 
    disconnect between Lakers vice president of basketball operations and general manager"
    
    text3_df <- tibble(line = 1:1, text3)
    
    output<- text3_df %>%
       unnest_tokens(word, text3) %>% 
       mutate(
          num_letters = nchar(word),
        #  num_vowels = get_count(word),
       )
    
    output$num_consec_char<- sapply(output$word, function(word){
       max(rle(unlist(strsplit(word, split = "")))$lengths)
    })
    output
    
    
    # A tibble: 32 × 4
    line word        num_letters num_consec_char
    <int> <chr>             <int>           <int>
       1     1 the                   3               1
       2     1 most                  4               1
       3     1 pressing              8               2
       4     1 of                    2               1
       5     1 those                 5               1
       6     1 issues                6               2
       7     1 considering          11               1