dat <- data.frame(outcome = rnorm(25),
sex = sample(c("F", "M"), 25, replace = TRUE),
age_group = sample(c(1, 2, 3), 25, replace = TRUE))
> head(dat)
outcome sex age_group
1 1.1423 F 2
2 0.0998 M 1
3 -1.6305 F 2
4 -1.6759 F 1
5 0.3825 F 2
6 0.7274 F 3
I have a dataset that has a continuous outcome
variable. I would like to obtain a LaTeX table of descriptive statistics for this variable stratified by sex
and age_group
. I would like it to look something like this (it doesn't have to have mean (SD) but I want the layout of outcome stratified by age_group and sex):
I've tried the Hmisc
package:
library(Hmisc)
output <- summaryM(outcome ~ sex + age_group, data = dat, test = TRUE)
latex(output, file = "")
but the output looks very different from what I want:
Im more familiar with the gt package, and highly recommend you learn how to use it.
Here is a solution using gt package and your example code.
#Install the package and load the dependencies. Here Ill be using dplyr to
#group by variables.
install.packages("gt")
library(gt)
library(dplyr)
dat <- data.frame(outcome = rnorm(25),
sex = sample(c("F", "M"), 25, replace = TRUE),
age_group = sample(c(1, 2, 3), 25, replace = TRUE))
head(dat) %>%
#Group by desired column
group_by(sex) %>%
#Create a gt table with the data frame
gt() %>%
#Rename columns
cols_label(outcome = "",
sex = "Sex",
age_group = "Cohort") %>%
#Add a table title
#Notice the `md` function allows to write the title using markdown syntax (which allows HTML)
tab_header(title = md("Table 1: Descriptive Statistics (N = 7")) %>%
#Add a data source footnote
tab_source_note(source_note = "Data: Stackoverflow question 7508787 [user: Adrian]")%>%
#you can customize the table´s body and lines as well using the tab_option
#function and tab_style function.
tab_options(row.striping.include_table_body = FALSE) %>%
tab_style(style = cell_borders(
sides = c("top"),
color = "black",
weight = px(1),
style = "solid"),
locations = cells_body(
columns = everything(),
rows = everything()
)) %>%
#Finally you can create summaries with different statistics as wanted.
summary_rows(
groups = TRUE,
columns = outcome,
fns = list(
average = "mean",
total = "sum",
SD = "sd")
)