I have a time series of policies that were adopted over the past few decades, and want to make a stacked area plot with cumulative policy counts, as they remain in force after adoption. I would like them to be grouped by organization, with time on the x and cumulative count on the y to show growth in policy adoption over time.
Data:
df<- data.frame(
organization = c("a", "a", "c", "c", "a", "b"),
year = c(1990, 1991, 1992, 1993, 1994, 1995),
count= c(1,1,1,0,1,1))
I have tried the following:
df%>%
group_by(organization, year) %>%
summarise(total = sum(count)) %>%
ggplot( aes(x=year, y= cumsum( total),fill=factor(organization))) +
geom_area(position = "stack")
Right now I get a plot like this that is not cumulative -- I think it is because for some years there is no policy adopted.
I am interested in getting something that looks like this:
Image source: https://www.r-graph-gallery.com/136-stacked-area-chart.html
I would really appreciate any help!!!
For each organization, you'll want to make sure you have at least one value for counts for the minimum and maximum years. This is so that ggplot2
will fill in the gaps. Also, you'll want to be careful with cumulating sums. So the solution I've shown below adds in a zero count if not value exists for the earliest and last year.
I've added some code so that you can automate the adding of rows for organizations that don't have data for the first and last all years of your data.
To incorporate this automated code, you'll want to merge in the tail_dat
complete_dat
data frame and change the variables dat
within the data.frame()
definition to suite your own data.
library(ggplot2)
library(dplyr)
library(tidyr)
# Create sample data
dat <- tribble(
~organization, ~year, ~count,
"a", 1990, 1,
"a", 1991, 1,
"b", 1991, 1,
"c", 1992, 1,
"c", 1993, 0,
"a", 1994, 1,
"b", 1995, 1
)
dat
#> # A tibble: 7 x 3
#> organization year count
#> <chr> <dbl> <dbl>
#> 1 a 1990 1
#> 2 a 1991 1
#> 3 b 1991 1
#> 4 c 1992 1
#> 5 c 1993 0
#> 6 a 1994 1
#> 7 b 1995 1
# NOTE incorrect results for comparison
dat %>%
group_by(organization, year) %>%
summarise(total = sum(count)) %>%
ggplot(aes(x = year, y = cumsum(total), fill = organization)) +
geom_area()
#> `summarise()` regrouping output by 'organization' (override with `.groups` argument)
# Fill out all years and organization combinations
complete_dat <- tidyr::expand(dat, organization, year = 1990:1995)
complete_dat
#> # A tibble: 18 x 2
#> organization year
#> <chr> <int>
#> 1 a 1990
#> 2 a 1991
#> 3 a 1992
#> 4 a 1993
#> 5 a 1994
#> 6 a 1995
#> 7 b 1990
#> 8 b 1991
#> 9 b 1992
#> 10 b 1993
#> 11 b 1994
#> 12 b 1995
#> 13 c 1990
#> 14 c 1991
#> 15 c 1992
#> 16 c 1993
#> 17 c 1994
#> 18 c 1995
# Update data so that counting works and fills in gaps
final_dat <- complete_dat %>%
left_join(dat, by = c("organization", "year")) %>%
replace_na(list(count = 0)) %>% # Replace NA with zeros
group_by(organization, year) %>%
arrange(organization, year) %>% # Arrange by year so adding works
group_by(organization) %>%
mutate(aggcount = cumsum(count))
final_dat
#> # A tibble: 18 x 4
#> # Groups: organization [3]
#> organization year count aggcount
#> <chr> <dbl> <dbl> <dbl>
#> 1 a 1990 1 1
#> 2 a 1991 1 2
#> 3 a 1992 0 2
#> 4 a 1993 0 2
#> 5 a 1994 1 3
#> 6 a 1995 0 3
#> 7 b 1990 0 0
#> 8 b 1991 1 1
#> 9 b 1992 0 1
#> 10 b 1993 0 1
#> 11 b 1994 0 1
#> 12 b 1995 1 2
#> 13 c 1990 0 0
#> 14 c 1991 0 0
#> 15 c 1992 1 1
#> 16 c 1993 0 1
#> 17 c 1994 0 1
#> 18 c 1995 0 1
# Plot results
final_dat %>%
ggplot(aes(x = year, y = aggcount, fill = organization)) +
geom_area()
Created on 2020-12-10 by the reprex package (v0.3.0)