I want to transform my dataframe into a format that would be suitable for a waterfall chart.
My dataframe is as follows:
employee <- c('A','B','C','D','E','F',
'A','B','C','D','E','F',
'A','B','C','D','E','F',
'A','B','C','D','E','F')
revenue <- c(10, 20, 30, 40, 10, 40,
8, 10, 20, 50, 20, 10,
2, 5, 70, 30, 10, 50,
40, 8, 30, 40, 10, 40)
date <- as.Date(c('2017-03-01','2017-03-01','2017-03-01',
'2017-03-01','2017-03-01','2017-03-01',
'2017-03-02','2017-03-02','2017-03-02',
'2017-03-02','2017-03-02','2017-03-02',
'2017-03-03','2017-03-03','2017-03-03',
'2017-03-03','2017-03-03','2017-03-03',
'2017-03-04','2017-03-04','2017-03-04',
'2017-03-04','2017-03-04','2017-03-04'))
df <- data.frame(date,employee,revenue)
date employee revenue
1 2017-03-01 A 10
2 2017-03-01 B 20
3 2017-03-01 C 30
4 2017-03-01 D 40
5 2017-03-01 E 10
6 2017-03-01 F 40
7 2017-03-02 A 8
8 2017-03-02 B 10
9 2017-03-02 C 20
10 2017-03-02 D 50
11 2017-03-02 E 20
12 2017-03-02 F 10
13 2017-03-03 A 2
14 2017-03-03 B 5
15 2017-03-03 C 70
16 2017-03-03 D 30
17 2017-03-03 E 10
18 2017-03-03 F 50
19 2017-03-04 A 40
20 2017-03-04 B 8
21 2017-03-04 C 30
22 2017-03-04 D 40
23 2017-03-04 E 10
24 2017-03-04 F 40
How do I transform this dataframe so that I can get it into a form for a waterfall chart in ggplot2?
The amount
column is the difference from the total day by employee.
The end
column is the start
column minus the amount
column.
The start
column is the Total
end values from previous day.
Final dataframe should look like this:
date employee start end amount total_for_day
1 2017-03-01 A 0 10 10 10
2 2017-03-01 B 0 20 20 20
3 2017-03-01 C 0 30 30 30
4 2017-03-01 D 0 40 40 40
5 2017-03-01 E 0 10 10 10
6 2017-03-01 F 0 40 40 40
7 2017-03-01 Total 0 150 150 150
8 2017-03-02 A 150 148 -2 8
9 2017-03-02 B 150 140 -10 10
10 2017-03-02 C 150 140 -10 20
11 2017-03-02 D 150 160 10 50
12 2017-03-02 E 150 160 10 20
13 2017-03-02 F 150 120 -30 10
14 2017-03-02 Total 150 118 -32 98
15 2017-03-03 A 118 112 -6 2
16 2017-03-03 B 118 113 -5 5
17 2017-03-03 C 118 168 50 70
18 2017-03-03 D 118 98 -20 30
19 2017-03-03 E 118 108 -10 10
20 2017-03-03 F 118 158 40 50
21 2017-03-03 Total 118 167 49 170
22 2017-03-04 A 167 205 38 40
23 2017-03-04 B 167 170 3 8
24 2017-03-04 C 167 127 -40 30
25 2017-03-04 D 167 177 10 40
26 2017-03-04 E 167 167 0 10
27 2017-03-04 F 167 157 -10 40
28 2017-03-04 Total 167 168 1 168
There are a few steps to get you to this, and I think that the dplyr
package will help (used heavily below).
My understanding is that revenue
gives the cumulative total revenue, rather than the daily change. If that is wrong, you would need to reverse some of these calculations.
The first step is to create a new data.frame that calculates the daily totals, then bind that back to the data.frame. Then, you can group_by
the employees (including "Total") and add columns that will be created separately for each employee (value on the previous day, the change, and then whether it was an increase or a decrease).
toPlot <-
bind_rows(
df
, df %>%
group_by(date) %>%
summarise(revenue = sum(revenue)) %>%
mutate(employee = "Total")
) %>%
group_by(employee) %>%
mutate(
previousDay = lag(revenue, default = 0)
, change = revenue - previousDay
, direction = ifelse(change > 0
, "Positive"
, "Negative"))
returns:
date employee revenue previousDay change direction
<date> <chr> <dbl> <dbl> <dbl> <chr>
1 2017-03-01 A 10 0 10 Positive
2 2017-03-01 B 20 0 20 Positive
3 2017-03-01 C 30 0 30 Positive
4 2017-03-01 D 40 0 40 Positive
5 2017-03-01 E 10 0 10 Positive
6 2017-03-01 F 40 0 40 Positive
7 2017-03-02 A 8 10 -2 Negative
8 2017-03-02 B 10 20 -10 Negative
9 2017-03-02 C 20 30 -10 Negative
10 2017-03-02 D 50 40 10 Positive
# ... with 18 more rows
Then, we can plot that using:
toPlot %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
facet_wrap(~employee
, scale = "free_y") +
scale_fill_brewer(palette = "Set1")
to give
Note that including "Total" throws off the scale (requiring the free scales), so I would prefer to omit it:
toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
facet_wrap(~employee) +
scale_fill_brewer(palette = "Set1")
For this to allow direct comparsion between employees
and this for the overall total
toPlot %>%
filter(employee == "Total") %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
scale_fill_brewer(palette = "Set1")
though I still find line graphs to be easier to interpret (especially comparing employees):
toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(x = date
, y = revenue
, col = employee)) +
geom_line() +
scale_fill_brewer(palette = "Dark2")
If you want to plot the changes themselves by day, you can do:
toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(x = date
, y = change
, fill = employee)) +
geom_col(position = "dodge") +
scale_fill_brewer(palette = "Dark2")
to get:
but now you are getting rather far from the "waterfall" plot outputs. If you really, really want to make a waterfall comparable across plots you can, but it is going to be rather ugly (I'd strongly recommend the line plot above instead).
Here, you need to manually move the boxes around, and this will require some tinkering if you change the output aspect ratio (or size) or the number of employees. You also need to include colors for both the employee and the direction of the change, which starts to look rough. This falls into the category of "can, but probably shouldn't" -- there is likely a better way to display these data.
toPlot %>%
filter(employee != "Total") %>%
ungroup() %>%
mutate(empNumber = as.numeric(as.factor(employee))) %>%
ggplot(aes(xmin = (empNumber) - 0.4
, xmax = (empNumber) + 0.4
, ymin = previousDay
, ymax = revenue
, col = direction
, fill = employee)) +
geom_rect(size = 1.5) +
facet_grid(~date) +
scale_fill_brewer(palette = "Dark2") +
theme(axis.text.x = element_blank()
, axis.ticks.x = element_blank())
gives