rdataframefunctionfactors

Creating new factor column based on page range


I am looking for a smarter way to create a new factor column in an R data frame df. I have a dataframe, to which I would like to add a new column, which tells me, which section the given record belongs to. Sections like this:

section_in_text <- factor(c('Introduction', 'Characters', 'Footnotes', 'Bibliography'))

To which section a given record belongs is defined by the column df$page.

As of now, I have achieved this with a function, which looks like this:

document_sections <- function(x) {
if (x<5) {
return("Introduction")
}
else if ((5<=x) & (x<23)) {
return("Characters")
}...}

Then I have used sapply() df$section <- sapply(df$page, document_sections)

Is there maybe a smarter way to achieve the same result ?

Thanks.


Solution

  • Using cut():

    df <- data.frame(page = seq(1, 40, by = 2))
    
    df$section <- cut(
      df$page, 
      breaks = c(-Inf, 5, 23, 30, Inf),
      labels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography'),
      right = FALSE
    )
    

    Or using dplyr::case_when():

    library(dplyr)
    
    df %>%
      mutate(section = factor(
        case_when(
          page < 5 ~ 'Introduction',
          page < 23 ~ 'Characters', 
          page < 30 ~ 'Footnotes', 
          !is.na(page) ~ 'Bibliography'
        ),
        levels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography')
      ))
    

    Result from either approach:

       page      section
    1     1 Introduction
    2     3 Introduction
    3     5   Characters
    4     7   Characters
    5     9   Characters
    6    11   Characters
    7    13   Characters
    8    15   Characters
    9    17   Characters
    10   19   Characters
    11   21   Characters
    12   23    Footnotes
    13   25    Footnotes
    14   27    Footnotes
    15   29    Footnotes
    16   31 Bibliography
    17   33 Bibliography
    18   35 Bibliography
    19   37 Bibliography
    20   39 Bibliography