rdplyrfiltersubset

Filter vector by proportion of negative values


I have a vector with ordered negative and positive values:

 x <- c(-35, -30, -25, -20, -15, -10, -5, -2, -1, -0.5, 0, 5, 22, 77)

I need to filter the values in the vector by a certain proportion of the negative values. Say, while I want to keep all positive values, I only want to retain the last third of the negative values. I do have a solution for that but it looks awfully bulky:

data.frame(x) %>%
  mutate(x_neg = ifelse(x < 0, x, NA),
         id = consecutive_id(x_neg),
         x_neg_length = length(x_neg[!is.na(x_neg)])) %>%
  filter(id > x_neg_length/3*2) %>%
  select(x)
      x
1     -5.0
2     -2.0
3     -1.0
4     -0.5
5     0.0
6     5.0
7     22.0
8     77.0

Is there a more concise/more elegant solution (preferably a dplyr one)?


Solution

    1. Re-write
    2. Updated to corrected data.

    Note the difference between floor() and ceiling().

    I think base is best here. However, a {dplyr}-solution is given as well. I assume that x (vector) or X (data frame) is ordered (by x).

    > # Base with floor()
    > X[-seq(floor(sum(x < 0) * 2/3)), ]
          x y
    7  -5.0 G
    8  -2.0 H
    9  -1.0 I
    10 -0.5 J
    11  0.0 K
    12  5.0 L
    13 22.0 M
    14 77.0 N
    > # tail(X, floor(sum(x < 0) * 2/3) + 1)
    > 
    > # just indexing x 
    > x[-seq(floor(sum(x < 0) * 2/3))]
    [1] -5.0 -2.0 -1.0 -0.5  0.0  5.0 22.0 77.0
    > 
    > # Base with ceiling()
    > X[-seq(ceiling(sum(x < 0) * 2/3)), ]
          x y
    8  -2.0 H
    9  -1.0 I
    10 -0.5 J
    11  0.0 K
    12  5.0 L
    13 22.0 M
    14 77.0 N
    > # tail(X, ceiling(sum(x < 0) * 2/3) + 1)
    > 
    > # just indexing x 
    > x[-seq(ceiling(sum(x < 0) * 2/3))]
    [1] -2.0 -1.0 -0.5  0.0  5.0 22.0 77.0
    > 
    > # dplyr::slice_tail + floor
    > dplyr::slice_tail(X, n = floor(sum(x < 0) * 2/3) + 1) # |> pull(x)
         x y
    1 -2.0 H
    2 -1.0 I
    3 -0.5 J
    4  0.0 K
    5  5.0 L
    6 22.0 M
    7 77.0 N
    > 
    > # dplyr::slice_tail + ceiling
    > dplyr::slice_tail(X, n = ceiling(sum(x < 0) * 2/3) + 1) # |> pull(x)
         x y
    1 -5.0 G
    2 -2.0 H
    3 -1.0 I
    4 -0.5 J
    5  0.0 K
    6  5.0 L
    7 22.0 M
    8 77.0 N
    

    Note

    x = c(-35, -30, -25, -20, -15, -10, -5, -2, -1, -0.5, 0, 5, 22, 77)
    X = data.frame(x, y = rep(LETTERS[seq_along(x)]))