This is a totally constructed example. It is just meant to understand the conceptual differences.
I am running this code
library(palmerpenguins)
penguins %>%
group_by(species) %>%
filter(
if_else(
species == "Adelie",
if_else(
n_distinct(island) > 1,
row_number() == 1,
row_number() == 2
),
row_number() %in% 1:2
))
Thinking that when the species is Adelie and in this species there is only one island it would return the first, otherwise the second row. If the species is not Adelie, it returns the first two rows. However I get this error:
Error in `filter()`:
ℹ In argument: `if_else(...)`.
ℹ In group 1: `species = Adelie`.
Caused by error in `if_else()`:
! `true` must have size 1, not size 152.
Which I do not understand completely, because the row_number() == 1
return either FALSE
or TRUE
on per-line basis doesnt it?
I know I run it with case_when
like this:
penguins %>%
group_by(species) %>%
filter(case_when(
species == "Adelie" & n_distinct(island) > 1 ~ row_number() == 1,
species == "Adelie" ~ row_number() == 2,
.default = row_number() %in% 1:2
))
And it works. I thought if_else
and case_when
were vectorized. But I guess I'm missing something basic here. I'd be super helpful for any hint.
The true=
and false=
legs of if_else
will be recycled to the length of the condition
argument but condition
will not be recycled to the length of true
and false
. Also recycling means making longer. Recycling will not make something shorter.
The condition
and the true
and false
legs must be the same lengths after recycling and since the n_distinct(island) > 1
condition has length 1 which will not be recycled whereas the true
and false
legs have length > 1 we have an error.
We could replace the if_else
with this where we have used rep
to recycle the condition ourselves to the length of the legs:
if_else(rep(n_distinct(island) > 1, n()), row_number() == 1, row_number() == 2)
but what we really want here is if
rather than if_else
since if
is normally used when the condition is a scalar:
if (n_distinct(island) > 1) row_number() == 1 else row_number() == 2
or possibly
row_number() == (if (n_distinct(island) > 1) 1 else 2)
so we have
penguins %>%
group_by(species) %>%
filter(
if_else(
species == "Adelie",
if (n_distinct(island) > 1) row_number() == 1 else row_number() == 2,
row_number() %in% 1:2
))