Suppose I have a dataframe df
in R like this:
A B C D
1.4 4.0 6.0 1.0
2.5 1.5 2.4 2.3
3.0 1.7 2.5 3.4
Say I want to write a function that checks the value of each cell in every row in several specified columns, and performs calculations on them depending on the value, and puts the results in new columns.
Take a simple example. I want to do the following for columns A
, B
and D
: if the corresponding value, call it x, of the row of the corresponding column is x < 2
, I want to add 1, if 2 <= x < 3
, I want to add 3, if 3 <= x < 4
, I want to add 5, and do nothing otherwise. I want to store results in 3 new columns called A_New
, B_New
, D_New
.
So this is what I want to get:
A B C D A_New B_New D_New
1.4 4.0 6.0 1.0 2.4 4.0 2.0
2.5 1.5 2.4 2.3 5.5 2.5 5.3
3.0 1.7 2.5 3.4 8.0 2.7 8.4
I am struggling to create R code that will do this (preferably using dplyr
/ tidyverse
library). Please help.
As @Limey says in comments, dplyr::across()
(+ case_when()
) does everything you need ...
dd <- read.table(header=TRUE, text = "
A B C D
1.4 4.0 6.0 1.0
2.5 1.5 2.4 2.3
3.0 1.7 2.5 3.4
")
library(dplyr)
dd |>
mutate(across(c(A, B, D),
.names = "{.col}_New",
~ case_when(. < 2 ~ . + 1,
. < 3 ~ . + 3,
. < 4 ~ . + 5,
.default = .)))
case_when
are evaluated sequentially, so (for example) we don't need to test for x >= 2
in the second casex < 5
, add ceiling(x-2)*2 +1
to x") (or something clever using cut()
and a matching vector of increments), but it will be harder to understand and less generalizable ...