rggplot2stacked-area-chart

Stacked Area Plot with ggplot in R: How to only only use the highest of y per corresponding x?


I'm trying to create a stacked area plot but it looks bad (see link below).

Below is my data. The dates should be x-axis, and the cases should be the y-axis. However, the same date occurs multiple times with different number of cases. When this happens, I want the date to be represented once with the sum of the cases for that particular date (and for that particular type).

Note also that the stacked area plot must be split into the 3 types ("type" column in the data below).

My data looks like this:

# Groups:   type [3]
   Province.State Country.Region   Lat  Long date       cases type      loc    total cumsum
   <chr>          <chr>          <dbl> <dbl> <date>     <int> <chr>     <chr>  <int>  <int>
 1 ""             France            47     2 2020-01-24     2 confirmed Europe     2      2
 2 ""             France            47     2 2020-01-25     1 confirmed Europe     1      3
 3 ""             Germany           51     9 2020-01-27     1 confirmed Europe     1      4
 4 ""             France            47     2 2020-01-28     1 confirmed Europe     4      5
 5 ""             Germany           51     9 2020-01-28     3 confirmed Europe     4      8
 6 ""             Finland           64    26 2020-01-29     1 confirmed Europe     2      9
 7 ""             France            47     2 2020-01-29     1 confirmed Europe     2     10
 8 ""             Germany           51     9 2020-01-31     1 confirmed Europe     6     11
 9 ""             Italy             43    12 2020-01-31     2 confirmed Europe     6     13
10 ""             Sweden            63    16 2020-01-31     1 confirmed Europe     6     14
# ... with 378 more rows

Here's how the plot looks so far:

Ugly stacked area plot so far


Solution

  • With the example data given and the description of the desired plot ...

    1. For type = "death" I simply replicated the given data. Just as an example.
    2. From the desciption it was not totally clear how the final plot should like, e.g. would your show different countries or locations.

    Therefore I just made a stacked are plot of cumulated cases by date and time. Try this:

    library(ggplot2)
    library(dplyr)
    
    dataset <- structure(list(
      id = c(
        "1", "2", "3", "4", "5", "6", "7", "8",
        "9", "10", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"
      ),
      Province.State = c(
        "\"\"", "\"\"", "\"\"", "\"\"", "\"\"",
        "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"",
        "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\""
      ),
      Country.Region = c(
        "France", "France", "Germany", "France",
        "Germany", "Finland", "France", "Germany", "Italy", "Sweden",
        "France", "France", "Germany", "France", "Germany", "Finland",
        "France", "Germany", "Italy", "Sweden"
      ), Lat = c(
        47L, 47L,
        51L, 47L, 51L, 64L, 47L, 51L, 43L, 63L, 47L, 47L, 51L, 47L,
        51L, 64L, 47L, 51L, 43L, 63L
      ), Long = c(
        2L, 2L, 9L, 2L, 9L,
        26L, 2L, 9L, 12L, 16L, 2L, 2L, 9L, 2L, 9L, 26L, 2L, 9L, 12L,
        16L
      ), date = structure(c(
        18285, 18286, 18288, 18289, 18289,
        18290, 18290, 18292, 18292, 18292, 18285, 18286, 18288, 18289,
        18289, 18290, 18290, 18292, 18292, 18292
      ), class = "Date"),
      cases = c(
        2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
        1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L
      ), type = c(
        "confirmed", "confirmed",
        "confirmed", "confirmed", "confirmed", "confirmed", "confirmed",
        "confirmed", "confirmed", "confirmed", "death", "death",
        "death", "death", "death", "death", "death", "death", "death",
        "death"
      ), loc = c(
        "Europe", "Europe", "Europe", "Europe",
        "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
        "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
        "Europe", "Europe", "Europe", "Europe"
      ), total = c(
        2L, 1L,
        1L, 4L, 4L, 2L, 2L, 6L, 6L, 6L, 2L, 1L, 1L, 4L, 4L, 2L, 2L,
        6L, 6L, 6L
      ), cumsum = c(
        2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L,
        13L, 14L, 2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L, 13L, 14L
      )
    ), class = c(
      "tbl_df",
      "tbl", "data.frame"
    ), row.names = c(NA, -20L))
    
    dataset_plot <- dataset %>%
      # Number of cases by date, type
      count(date, type, wt = cases, name = "cases") %>%
      # Cumulated sum over time by type
      group_by(type) %>%
      arrange(date) %>%
      mutate(cumsum = cumsum(cases))
    
    ggplot(dataset_plot, aes(date, cumsum, fill = type)) +
      geom_area()
    

    Created on 2020-03-18 by the reprex package (v0.3.0)