year <- c(2000,2000,2000,2001,2001,2001)
gender <- c("F","M","M","F","F","M")
grade <- c("A","B","C","C","B","A")
df <- data.frame(year,gender,grade)
I would like to make a summary table but minimize manual codes and automates the process as much as possible. In my project, I have 170 variables to summarise. I tried tidyverse group by but didn't get the result that I want. I will use xtable to move to latex file. (I tried add.to.row but failed to add "Gender" on the first line.)
Here is the outcome that I want.
Please help me to draw this table. I need the variable names in the table.
You can use pivot_longer
and summarise
to generate the summary values.
library(tidyverse)
df %>%
pivot_longer(-year) %>%
group_by(year, name, value) %>%
summarise(n = n()) %>%
mutate(prop = round(n / sum(n), 3) * 100)
# A tibble: 10 x 5
# Groups: year, name [4]
year name value n prop
<dbl> <chr> <chr> <int> <dbl>
1 2000 gender F 1 33.3
2 2000 gender M 2 66.7
3 2000 grade A 1 33.3
4 2000 grade B 1 33.3
5 2000 grade C 1 33.3
6 2001 gender F 2 66.7
7 2001 gender M 1 33.3
8 2001 grade A 1 33.3
9 2001 grade B 1 33.3
10 2001 grade C 1 33.3
You can also get closer to your desired table by joining values in a formatted string, and then using pivot_wider
:
df %>%
pivot_longer(-year) %>%
group_by(year, name, value) %>%
summarise(n = n()) %>%
mutate(prop = round(n / sum(n), 3) * 100,
summary_str = glue::glue("{n}({prop}%)")) %>%
pivot_wider(id_cols = c(name, value), names_from = "year",
values_from = "summary_str")
name value `2000` `2001`
<chr> <chr> <glue> <glue>
1 gender F 1(33.3%) 2(66.7%)
2 gender M 2(66.7%) 1(33.3%)
3 grade A 1(33.3%) 1(33.3%)
4 grade B 1(33.3%) 1(33.3%)
5 grade C 1(33.3%) 1(33.3%)