I have a data set where for each
region there is a value
pos, but some regions don't have a value of 1.
region <- c('A','A','B','B','C','D','D','E') pos <- c(0,1,0,1,0,0,1,0) df <- data.frame(region, pos)
For each region, I would like to select the row where the value of
pos is 1. If there is a region where there is no
pos value of 1, I would like to select the 0 row. So in essence I would like to have a resulting data set that has one row for each region but for the regions where this is a 1, I would like to select that row. There is never more than 2 rows for each region.
The result should look like this:
df < Region pos 1 A 1 2 B 1 3 C 0 4 D 1 5 E 0
Thanks in advance for your help.
If you're certain all values are 0s and 1s, you might be able to get away with:
library(dplyr) df %>% slice_max(pos, by = region, with_ties = FALSE) # region pos # 1 A 1 # 2 B 1 # 3 C 0 # 4 D 1 # 5 E 0
(You can verify this returns at most one row per region by using
bind_rows(df, df) instead of
df, just to demonstrate.)
If that's not safe, though, then something like
df %>% group_by(region) %>% filter(row_number() == c(which(pos == 1), which(pos == 0))) %>% ungroup() # # A tibble: 5 × 2 # region pos # <chr> <dbl> # 1 A 1 # 2 B 1 # 3 C 0 # 4 D 1 # 5 E 0
In this case, if a region has neither
pos == 1 nor
pos == 0 then the region will go away.