I have a dataset with three columns country, years, and growth. I want to identify periods when a country experienced >=6% growth for 3 or more years, alongside another column with the mean growth during that period.
I need to create a data frame like: Albania 1999-2001, 0.133 (namely country, period, and mean_growth columns), etc.
I filtered the original data frame to show the years where growth was >= 0.06. Here's one country:
country cow_code years growth
1 Albania 339 1999 0.158
2 Albania 339 2000 0.106
3 Albania 339 2001 0.136
4 Albania 339 2003 0.123
5 Albania 339 2004 0.0930
6 Albania 339 2006 0.228
7 Albania 339 2007 0.238
8 Albania 339 2008 0.178
9 Albania 339 2010 0.0744
10 Albania 339 2018 0.0825
11 Albania 339 2019 0.137
12 Albania 339 2021 0.181
I want to produce something like this for all the countries that are in my original dataframe:
country period mean_growth
1 Albania 1999-2001 0.133
2 Albania 2006-2008 0.215
Any clue how I can do this?
You can use cumsum(c(1, diff(years) != 1))
to split the data into the groups of consecutive years by countries.
library(dplyr)
df %>%
mutate(grp = cumsum(c(1, diff(years) != 1)), .by = country) %>%
group_by(country, grp) %>%
filter(n() >= 3) %>%
summarise(period = paste(range(years), collapse = '-'),
mean_growth = mean(growth), .groups = "drop") %>%
select(-grp)
# # A tibble: 2 × 3
# country period mean_growth
# <chr> <chr> <dbl>
# 1 Albania 1999-2001 0.133
# 2 Albania 2006-2008 0.215
df <- read.table(text = "
country cow_code years growth
1 Albania 339 1999 0.158
2 Albania 339 2000 0.106
3 Albania 339 2001 0.136
4 Albania 339 2003 0.123
5 Albania 339 2004 0.0930
6 Albania 339 2006 0.228
7 Albania 339 2007 0.238
8 Albania 339 2008 0.178
9 Albania 339 2010 0.0744
10 Albania 339 2018 0.0825
11 Albania 339 2019 0.137
12 Albania 339 2021 0.181")