I'm trying to create a stacked area plot but it looks bad (see link below).
Below is my data. The dates should be x-axis, and the cases should be the y-axis. However, the same date occurs multiple times with different number of cases. When this happens, I want the date to be represented once with the sum of the cases for that particular date (and for that particular type).
Note also that the stacked area plot must be split into the 3 types ("type" column in the data below).
My data looks like this:
# Groups: type [3]
Province.State Country.Region Lat Long date cases type loc total cumsum
<chr> <chr> <dbl> <dbl> <date> <int> <chr> <chr> <int> <int>
1 "" France 47 2 2020-01-24 2 confirmed Europe 2 2
2 "" France 47 2 2020-01-25 1 confirmed Europe 1 3
3 "" Germany 51 9 2020-01-27 1 confirmed Europe 1 4
4 "" France 47 2 2020-01-28 1 confirmed Europe 4 5
5 "" Germany 51 9 2020-01-28 3 confirmed Europe 4 8
6 "" Finland 64 26 2020-01-29 1 confirmed Europe 2 9
7 "" France 47 2 2020-01-29 1 confirmed Europe 2 10
8 "" Germany 51 9 2020-01-31 1 confirmed Europe 6 11
9 "" Italy 43 12 2020-01-31 2 confirmed Europe 6 13
10 "" Sweden 63 16 2020-01-31 1 confirmed Europe 6 14
# ... with 378 more rows
Here's how the plot looks so far:
With the example data given and the description of the desired plot ...
Therefore I just made a stacked are plot of cumulated cases by date and time. Try this:
library(ggplot2)
library(dplyr)
dataset <- structure(list(
id = c(
"1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"
),
Province.State = c(
"\"\"", "\"\"", "\"\"", "\"\"", "\"\"",
"\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"",
"\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\"", "\"\""
),
Country.Region = c(
"France", "France", "Germany", "France",
"Germany", "Finland", "France", "Germany", "Italy", "Sweden",
"France", "France", "Germany", "France", "Germany", "Finland",
"France", "Germany", "Italy", "Sweden"
), Lat = c(
47L, 47L,
51L, 47L, 51L, 64L, 47L, 51L, 43L, 63L, 47L, 47L, 51L, 47L,
51L, 64L, 47L, 51L, 43L, 63L
), Long = c(
2L, 2L, 9L, 2L, 9L,
26L, 2L, 9L, 12L, 16L, 2L, 2L, 9L, 2L, 9L, 26L, 2L, 9L, 12L,
16L
), date = structure(c(
18285, 18286, 18288, 18289, 18289,
18290, 18290, 18292, 18292, 18292, 18285, 18286, 18288, 18289,
18289, 18290, 18290, 18292, 18292, 18292
), class = "Date"),
cases = c(
2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L
), type = c(
"confirmed", "confirmed",
"confirmed", "confirmed", "confirmed", "confirmed", "confirmed",
"confirmed", "confirmed", "confirmed", "death", "death",
"death", "death", "death", "death", "death", "death", "death",
"death"
), loc = c(
"Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe"
), total = c(
2L, 1L,
1L, 4L, 4L, 2L, 2L, 6L, 6L, 6L, 2L, 1L, 1L, 4L, 4L, 2L, 2L,
6L, 6L, 6L
), cumsum = c(
2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L,
13L, 14L, 2L, 3L, 4L, 5L, 8L, 9L, 10L, 11L, 13L, 14L
)
), class = c(
"tbl_df",
"tbl", "data.frame"
), row.names = c(NA, -20L))
dataset_plot <- dataset %>%
# Number of cases by date, type
count(date, type, wt = cases, name = "cases") %>%
# Cumulated sum over time by type
group_by(type) %>%
arrange(date) %>%
mutate(cumsum = cumsum(cases))
ggplot(dataset_plot, aes(date, cumsum, fill = type)) +
geom_area()
Created on 2020-03-18 by the reprex package (v0.3.0)