I have some data to plot, which for purposes of organization and annotation are broken into several blocks, defined by a combination of variables including the one I'm calling group
in my code. The idea is that I want to differently plot the first batch of group "A" from the second batch of "A", same with "B" and "C," etc. So I want to mark each batch with a unique identifier, which here I'm calling plot_group
. To do this, I'd like to march along the length of group
, and increment plot_group
by 1 every time I move from one group to another.
I've figured out how to do this with a for
loop, below, but it looks ugly and I'd rather be able to do it with a vectorized function. I can't for the life of me figure out how, though, even using things like seq_along
and lag
, and the problem seems to be that a function can't refer to its own output on the fly.
There must be a dumb and obvious thing I'm missing, since this is hardly a sophisticated problem. Does anyone have a recommendation?
# vector of groups - repeated twice
group <- c(rep(c(rep('A', 2), rep('B', 4), rep('C', 3)),2))
# run through the "group" variable, incrementing plot_group by one every time a new group is encountered
for (i in seq_along(group)) {
# if we are at the beginning, initiate the first group as "1"
if(i==1) plot_group <- 1
# otherwise, check if we are at a new group - if so, increment plot_group by 1
else {
if (group[i] != group[i-1]) plot_group <- c(plot_group, plot_group[i-1]+1)
# if not, then just return the current plot_group variable
else plot_group <- c(plot_group, plot_group[i-1])
}
}
tibble(group=group, plot_group=plot_group)
# returns what I want:
## A tibble: 18 × 2
# group plot_group
# <chr> <dbl>
# 1 A 1
# 2 A 1
# 3 B 2
# 4 B 2
# 5 B 2
# 6 B 2
# 7 C 3
# 8 C 3
# 9 C 3
# 10 A 4
# 11 A 4
# 12 B 5
# 13 B 5
# 14 B 5
# 15 B 5
# 16 C 6
# 17 C 6
# 18 C 6
rm(plot_group)
# do the same as the above, but with sapply
plot_group <- sapply(seq_along(group), function(i) {
if(i==1) return(1)
else {
if (group[i] != group[i-1]) return(plot_group[i-1] + 1)
else return(plot_group[i-1])
}
})
# returns "Error in FUN(X[[i]], ...) : object 'plot_group' not found"
Using base::rle
:
group <- c(rep(c(rep('A', 2), rep('B', 4), rep('C', 3)),2))
data.frame(group,
plot_group = rle(group)$length |>
(\(.x) rep(seq_along(.x), .x))())
#> group plot_group
#> 1 A 1
#> 2 A 1
#> 3 B 2
#> 4 B 2
#> 5 B 2
#> 6 B 2
#> 7 C 3
#> 8 C 3
#> 9 C 3
#> 10 A 4
#> 11 A 4
#> 12 B 5
#> 13 B 5
#> 14 B 5
#> 15 B 5
#> 16 C 6
#> 17 C 6
#> 18 C 6
or in dplyr
:
tibble(group) %>%
mutate(consecutive_id(group))
Created on 2025-03-18 with reprex v2.1.1