rggplot2plotareastacked

Cumulative stacked area plot for counts in ggplot with R


I have a time series of policies that were adopted over the past few decades, and want to make a stacked area plot with cumulative policy counts, as they remain in force after adoption. I would like them to be grouped by organization, with time on the x and cumulative count on the y to show growth in policy adoption over time.

Data:

df<- data.frame(
  organization = c("a", "a", "c", "c", "a", "b"),
  year = c(1990, 1991, 1992, 1993, 1994, 1995),
  count= c(1,1,1,0,1,1))

I have tried the following:

df%>%
group_by(organization, year) %>%
summarise(total = sum(count)) %>%
ggplot(  aes(x=year, y= cumsum( total),fill=factor(organization))) +
geom_area(position = "stack")

Right now I get a plot like this that is not cumulative -- I think it is because for some years there is no policy adopted.

enter image description here

I am interested in getting something that looks like this:

enter image description here

Image source: https://www.r-graph-gallery.com/136-stacked-area-chart.html

I would really appreciate any help!!!


Solution

  • For each organization, you'll want to make sure you have at least one value for counts for the minimum and maximum years. This is so that ggplot2 will fill in the gaps. Also, you'll want to be careful with cumulating sums. So the solution I've shown below adds in a zero count if not value exists for the earliest and last year.

    I've added some code so that you can automate the adding of rows for organizations that don't have data for the first and last all years of your data. To incorporate this automated code, you'll want to merge in the tail_dat complete_dat data frame and change the variables dat within the data.frame() definition to suite your own data.

    library(ggplot2)
    library(dplyr)
    library(tidyr)
    
    # Create sample data
    dat <- tribble(
      ~organization, ~year, ~count,
      "a", 1990, 1,
      "a", 1991, 1,
      "b", 1991, 1,
      "c", 1992, 1,
      "c", 1993, 0,
      "a", 1994, 1,
      "b", 1995, 1
    )
    dat
    #> # A tibble: 7 x 3
    #>   organization  year count
    #>   <chr>        <dbl> <dbl>
    #> 1 a             1990     1
    #> 2 a             1991     1
    #> 3 b             1991     1
    #> 4 c             1992     1
    #> 5 c             1993     0
    #> 6 a             1994     1
    #> 7 b             1995     1
    
    # NOTE incorrect results for comparison
    dat %>%
      group_by(organization, year) %>%
      summarise(total = sum(count)) %>%
      ggplot(aes(x = year, y = cumsum(total), fill = organization)) +
      geom_area()
    #> `summarise()` regrouping output by 'organization' (override with `.groups` argument)
    

    
    # Fill out all years and organization combinations
    complete_dat <- tidyr::expand(dat, organization, year = 1990:1995)
    complete_dat
    #> # A tibble: 18 x 2
    #>    organization  year
    #>    <chr>        <int>
    #>  1 a             1990
    #>  2 a             1991
    #>  3 a             1992
    #>  4 a             1993
    #>  5 a             1994
    #>  6 a             1995
    #>  7 b             1990
    #>  8 b             1991
    #>  9 b             1992
    #> 10 b             1993
    #> 11 b             1994
    #> 12 b             1995
    #> 13 c             1990
    #> 14 c             1991
    #> 15 c             1992
    #> 16 c             1993
    #> 17 c             1994
    #> 18 c             1995
    
    # Update data so that counting works and fills in gaps
    final_dat <- complete_dat %>%
      left_join(dat, by = c("organization", "year")) %>%
      replace_na(list(count = 0)) %>%  # Replace NA with zeros
      group_by(organization, year) %>%
      arrange(organization, year) %>%  # Arrange by year so adding works
      group_by(organization) %>%
      mutate(aggcount = cumsum(count))
    final_dat
    #> # A tibble: 18 x 4
    #> # Groups:   organization [3]
    #>    organization  year count aggcount
    #>    <chr>        <dbl> <dbl>    <dbl>
    #>  1 a             1990     1        1
    #>  2 a             1991     1        2
    #>  3 a             1992     0        2
    #>  4 a             1993     0        2
    #>  5 a             1994     1        3
    #>  6 a             1995     0        3
    #>  7 b             1990     0        0
    #>  8 b             1991     1        1
    #>  9 b             1992     0        1
    #> 10 b             1993     0        1
    #> 11 b             1994     0        1
    #> 12 b             1995     1        2
    #> 13 c             1990     0        0
    #> 14 c             1991     0        0
    #> 15 c             1992     1        1
    #> 16 c             1993     0        1
    #> 17 c             1994     0        1
    #> 18 c             1995     0        1
    
    # Plot results
    final_dat %>%
      ggplot(aes(x = year, y = aggcount, fill = organization)) +
      geom_area()
    

    Created on 2020-12-10 by the reprex package (v0.3.0)