I am trying to add a column. I have a column “Y” with values (numeric) going from -50 to 350, I would like to create a new column “Z” evaluating the values creating variables with the conditions from -30 to 30 = “Transition”, 31 to 100 = “Early”, 101 to 200 = “Mid”, 201 to 300 = “Late” everything else “NA” I am trying using the case_when function, within mutate function from dplyr, see code below. But keep getting erorr message. Any help will be very much appreciated
DataSetNew <- DataSet %>%
dplyr::mutate(ColumnZ = case_when(
ColumnY == < = 30 ~ "Transition",
ColumnY == between(31,100) ~ "Early",
ColumnY == between(101,200) ~ "Mid",
ColumnY == between(201,305) ~ "Late",
TRUE ~ "NA"
))
Error: unexpected '<' in:
" dplyr::mutate(ColumnZ = case_when(
ColumnY == <"
You need to declare the column of interest inside the between()
function. In your question you state 201-300 == "Late", but in your code the upper threshold for "late" is 305. This example uses the former.
Also, instead of TRUE ~
for all other values, the most recent advice is to use .default =
instead.
library(dplyr)
# Sample data
DataSet <- data.frame(id = 1:9,
ColumnY = c(-30, 30, 31, 100, 101, 200, 201, 300, 301))
# Return ColumnZ
DataSetNew <- DataSet |>
mutate(ColumnZ = case_when(between(ColumnY, -Inf, 30) ~ "Transition",
between(ColumnY, 31, 100) ~ "Early",
between(ColumnY, 101, 200) ~ "Mid",
between(ColumnY, 201, 300) ~ "Late",
.default = NA))
DataSetNew
# id ColumnY ColumnZ
# 1 1 -30 Transition
# 2 2 30 Transition
# 3 3 31 Early
# 4 4 100 Early
# 5 5 101 Mid
# 6 6 200 Mid
# 7 7 201 Late
# 8 8 300 Late
# 9 9 301 <NA>
This is the equivalent of:
DataSetNew <- DataSet |>
mutate(ColumnZ = case_when(ColumnY <= 30 ~ "Transition",
ColumnY >= 31 & ColumnY <= 100 ~ "Early",
ColumnY >= 101 & ColumnY <= 200 ~ "Mid",
ColumnY >= 201 & ColumnY <= 300 ~ "Late",
.default = NA))