I am trying to get a large dataset prepped to create an ECharts4R sunburst visualization, which requires a hierarchical data structure. What is an efficient way to group by, summarize, and nest data multiple times?
Below is an example of how the data should look in its final prepped form (code from echarts4r sunburst documentation). They are building it from raw values:
# tibble hierarchical data representation
library(dplyr)
df <- tibble(
name = c("earth", "mars", "venus"),
value = c(30, 40, 30),
# 1st level
itemStyle = tibble(color = c(NA, "red", "blue")),
# embedded styles, optional
children = list(
tibble(
name = c("land", "ocean"),
value = c(10, 20),
# 2nd level
children = list(
tibble(name = c("forest", "river"), value = c(3, 7)),
# 3rd level
tibble(
name = c("fish", "kelp"),
value = c(10, 5),
children = list(
tibble(name = c("shark", "tuna"), value = c(2, 6)),
# 4th level
NULL # kelp
)
)
)
),
tibble(name = c("crater", "valley"), value = c(20, 20)),
NULL # venus
)
)
I am starting with a dataset:
Using mtcars, this is me trying to get to the same result using the columns am and gear as the root and level 1, respectively
library(tidyverse)
d1 <-
rownames_to_column(mtcars)
preppedData <-
d1 |>
# Root is am
group_by(
am
) |>
summarise(
mean_mpg = mean(mpg)
) |>
mutate(
# Level 1 is gear
children = list(
# AM = 0
bind_rows(
# Gear 3
d1 |>
filter(
am == 0,
gear == 3
) |>
summarise(mean_mpg = mean(mpg)),
## Gear 4
d1 |>
filter(
am == 0,
gear == 4
) |>
summarise(mean_mpg = mean(mpg)),
),
# AM = 1
bind_rows(
## Gear 4
d1 |>
filter(
am == 1,
gear == 4
) |>
summarise(mean_mpg = mean(mpg)),
## Gear 5
d1 |>
filter(
am == 1,
gear == 5
) |>
summarise(mean_mpg = mean(mpg))
)
)
)
This kind of coding is going to get unwieldy very fast. It is so repetitive, too, that I think there must be a nest + group_by + apply way to make this more efficient without having to use filter each time, but I rarely manipulate data this way to think of how I would do this.
Tim G, in the comments, points out that there is an easier data structure now that can be used for the sunburst plot.
https://github.com/JohnCoene/echarts4r/issues/207#issuecomment-718524703
df <- data.frame(parents = c("","earth", "earth", "mars", "mars", "land", "land", "ocean", "ocean", "fish", "fish", "Everything", "Everything", "Everything"),
labels = c("Everything", "land", "ocean", "valley", "crater", "forest", "river", "kelp", "fish", "shark", "tuna", "venus","earth", "mars"),
value = c(0, 30, 40, 10, 10, 20, 10, 20, 20, 8, 12, 10, 70, 20))
# create a tree object
universe <- data.tree::FromDataFrameNetwork(df)
# use it in echarts4r
universe %>%
e_charts() %>%
e_sunburst()