I have a vector with ordered negative and positive values:
x <- c(-35, -30, -25, -20, -15, -10, -5, -2, -1, -0.5, 0, 5, 22, 77)
I need to filter the values in the vector by a certain proportion of the negative values. Say, while I want to keep all positive values, I only want to retain the last third of the negative values. I do have a solution for that but it looks awfully bulky:
data.frame(x) %>%
mutate(x_neg = ifelse(x < 0, x, NA),
id = consecutive_id(x_neg),
x_neg_length = length(x_neg[!is.na(x_neg)])) %>%
filter(id > x_neg_length/3*2) %>%
select(x)
x
1 -5.0
2 -2.0
3 -1.0
4 -0.5
5 0.0
6 5.0
7 22.0
8 77.0
Is there a more concise/more elegant solution (preferably a dplyr
one)?
Note the difference between floor()
and ceiling()
.
I think base is best here. However, a {dplyr}
-solution is given as well. I assume that x
(vector) or X
(data frame) is ordered (by x
).
> # Base with floor()
> X[-seq(floor(sum(x < 0) * 2/3)), ]
x y
7 -5.0 G
8 -2.0 H
9 -1.0 I
10 -0.5 J
11 0.0 K
12 5.0 L
13 22.0 M
14 77.0 N
> # tail(X, floor(sum(x < 0) * 2/3) + 1)
>
> # just indexing x
> x[-seq(floor(sum(x < 0) * 2/3))]
[1] -5.0 -2.0 -1.0 -0.5 0.0 5.0 22.0 77.0
>
> # Base with ceiling()
> X[-seq(ceiling(sum(x < 0) * 2/3)), ]
x y
8 -2.0 H
9 -1.0 I
10 -0.5 J
11 0.0 K
12 5.0 L
13 22.0 M
14 77.0 N
> # tail(X, ceiling(sum(x < 0) * 2/3) + 1)
>
> # just indexing x
> x[-seq(ceiling(sum(x < 0) * 2/3))]
[1] -2.0 -1.0 -0.5 0.0 5.0 22.0 77.0
>
> # dplyr::slice_tail + floor
> dplyr::slice_tail(X, n = floor(sum(x < 0) * 2/3) + 1) # |> pull(x)
x y
1 -2.0 H
2 -1.0 I
3 -0.5 J
4 0.0 K
5 5.0 L
6 22.0 M
7 77.0 N
>
> # dplyr::slice_tail + ceiling
> dplyr::slice_tail(X, n = ceiling(sum(x < 0) * 2/3) + 1) # |> pull(x)
x y
1 -5.0 G
2 -2.0 H
3 -1.0 I
4 -0.5 J
5 0.0 K
6 5.0 L
7 22.0 M
8 77.0 N
Note
x = c(-35, -30, -25, -20, -15, -10, -5, -2, -1, -0.5, 0, 5, 22, 77)
X = data.frame(x, y = rep(LETTERS[seq_along(x)]))