I have dataframe like below
monkey = data.frame(girl = 1:10, kn = NA, boy = 5)
And i want to understand the following code meaning step by step
monkey %>%
mutate(t = ifelse(is.na(kn),.[,grepl('a',names(.))],ll))
Thank you everyone in advance for your support.
In my opinion, this is not good code, but I'll try to explain what it is doing.
is.na(kn)
(in the context of monkey
) returns a logical vector of whether each value in that column is NA
,
with(monkey, is.na(kn))
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
The use of .
in .[grepl(*)]
refers to the current data at the start of this call to mutate
; it would be more dplyr-canonical to use cur_data()
, which would be more-complete (e.g., taking into account previous mutated columns that .
does not recognize, not a factor here). I believe this .[*]
code is trying to select a column dynamically based on the current data.
Why this one is bad:
1. There is no column here whose name contains "a"
;
2. There could be more than one columns whose names contain "a"
, which means the yes=
argument to ifelse
would produce a nested frame in the new t=
column;
3. The behavior of .[,*]
changes if the original frame is the base-R data.frame
or if it is the tibble-variant tbl_df
: see monkey[,1]
versus tibble(monkey)[,1]
.
no=
argument refers to an object ll
that is not defined. This should (intuitively) fail with Error: object 'll' not found
or similar, but since all of the test=
argument is true, the no=
is not needed and so it not evaluated. Consider ifelse(c(TRUE, TRUE), 1:2, stop("oops"))
(no error) versus ifelse(c(TRUE, FALSE), 1:2, stop("oops"))
.
Ultimately, this code is not defensive-enough to be safe (base-vs-tibble variant) and its intent is unclear.
My advice when using dplyr
is to use dplyr::if_else
instead of base R's ifelse
. For one, ifelse
has some issues and limitations (e.g., How to prevent ifelse() from turning Date objects into numeric objects); for another, if_else
protects you from ambiguous, inconsistent-results code such as in your question.