I am looking for a smarter way to create a new factor column in an R data frame df
.
I have a dataframe, to which I would like to add a new column, which tells me, which section the given record belongs to. Sections like this:
section_in_text <- factor(c('Introduction', 'Characters', 'Footnotes', 'Bibliography'))
To which section a given record belongs is defined by the column df$page
.
As of now, I have achieved this with a function, which looks like this:
document_sections <- function(x) {
if (x<5) {
return("Introduction")
}
else if ((5<=x) & (x<23)) {
return("Characters")
}...}
Then I have used sapply()
df$section <- sapply(df$page, document_sections)
Is there maybe a smarter way to achieve the same result ?
Thanks.
Using cut()
:
df <- data.frame(page = seq(1, 40, by = 2))
df$section <- cut(
df$page,
breaks = c(-Inf, 5, 23, 30, Inf),
labels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography'),
right = FALSE
)
Or using dplyr::case_when()
:
library(dplyr)
df %>%
mutate(section = factor(
case_when(
page < 5 ~ 'Introduction',
page < 23 ~ 'Characters',
page < 30 ~ 'Footnotes',
!is.na(page) ~ 'Bibliography'
),
levels = c('Introduction', 'Characters', 'Footnotes', 'Bibliography')
))
Result from either approach:
page section
1 1 Introduction
2 3 Introduction
3 5 Characters
4 7 Characters
5 9 Characters
6 11 Characters
7 13 Characters
8 15 Characters
9 17 Characters
10 19 Characters
11 21 Characters
12 23 Footnotes
13 25 Footnotes
14 27 Footnotes
15 29 Footnotes
16 31 Bibliography
17 33 Bibliography
18 35 Bibliography
19 37 Bibliography
20 39 Bibliography