rdataframesummarization

Summarize consecutive years by group based on specific conditions


I have a dataset with three columns country, years, and growth. I want to identify periods when a country experienced >=6% growth for 3 or more years, alongside another column with the mean growth during that period.

I need to create a data frame like: Albania 1999-2001, 0.133 (namely country, period, and mean_growth columns), etc.

I filtered the original data frame to show the years where growth was >= 0.06. Here's one country:

  country cow_code years growth
 1 Albania      339 1999  0.158 
 2 Albania      339 2000  0.106 
 3 Albania      339 2001  0.136 
 4 Albania      339 2003  0.123 
 5 Albania      339 2004  0.0930
 6 Albania      339 2006  0.228 
 7 Albania      339 2007  0.238 
 8 Albania      339 2008  0.178 
 9 Albania      339 2010  0.0744
10 Albania      339 2018  0.0825
11 Albania      339 2019  0.137 
12 Albania      339 2021  0.181 

I want to produce something like this for all the countries that are in my original dataframe:

   country    period mean_growth
 1 Albania 1999-2001       0.133
 2 Albania 2006-2008       0.215 

Any clue how I can do this?


Solution

  • You can use cumsum(c(1, diff(years) != 1)) to split the data into the groups of consecutive years by countries.

    library(dplyr)
    
    df %>%
      mutate(grp = cumsum(c(1, diff(years) != 1)), .by = country) %>%
      group_by(country, grp) %>%
      filter(n() >= 3) %>%
      summarise(period = paste(range(years), collapse = '-'),
                mean_growth = mean(growth), .groups = "drop") %>%
      select(-grp)
    
    # # A tibble: 2 × 3
    #   country period    mean_growth
    #   <chr>   <chr>           <dbl>
    # 1 Albania 1999-2001       0.133
    # 2 Albania 2006-2008       0.215
    

    Data
    df <- read.table(text = "
      country cow_code years growth
     1 Albania      339 1999  0.158 
     2 Albania      339 2000  0.106 
     3 Albania      339 2001  0.136 
     4 Albania      339 2003  0.123 
     5 Albania      339 2004  0.0930
     6 Albania      339 2006  0.228 
     7 Albania      339 2007  0.238 
     8 Albania      339 2008  0.178 
     9 Albania      339 2010  0.0744
    10 Albania      339 2018  0.0825
    11 Albania      339 2019  0.137 
    12 Albania      339 2021  0.181")