rggplot2survivalhazard

How to plot hazard function in R using ggplot2?


There's a book called "Event History Analysis with R" by G. Broström. But it provides minimal R code for whatever concepts are presented. I'd like to plot a hazard function, how do I do that? I want the exact same plots as given here. The plots shown are in this chapter. If possible, I'd like to do the same plots using ggplot2.

I've done survival and cumulative hazard plots using ggplot2 that look better, using the code:

library(survminer)

fit <- survfit(Surv(enter, exit, event) ~ male, data = df_final)

ggsurvplot(fit, title = "Survival Function") +
  xlab("Age in years") +
  ylab("Survival")

My data looks like this:

id male enter exit event
001 1 0.00 13.34 1
001 1 13.34 45.00 0
002 1 0.00 27.00 0
003 0 0.00 23.60 1
003 0 23.60 27.00 1
003 0 27.00 67.00 0

Solution

  • You can get the R codes to reproduce the figures in the book from the book's GitHub repo.

    For example, for the figures you are trying to replicate you could copy and paste the basic data wrangling code from the repo and convert the base R plotting code to ggplot2 like so where I also made use of a bit of dplyr, tidyr and patchwork to combine the subplots:

    library(eha)
    library(tidyverse)
    library(patchwork)
    
    females <- swepop[
      swepop$sex == "women" & swepop$year %in% 2001:2020,
      c("age", "pop", "year")
    ]
    females$deaths <- swedeaths[swedeaths$sex == "women" &
      swedeaths$year %in% 2001:2020, "deaths"]
    females <- aggregate(females[c("pop", "deaths")], by = females["age"], FUN = sum)
    females$mort <- females$deaths / females$pop
    
    males <- swepop[
      swepop$sex == "men" & swepop$year %in% 2001:2020,
      c("age", "pop")
    ]
    males$deaths <- swedeaths[swedeaths$sex == "men" &
      swedeaths$year %in% 2001:2020, "deaths"]
    males <- aggregate(males[c("pop", "deaths")], by = males["age"], FUN = sum)
    males$mort <- males$deaths / males$pop
    
    dat <- list(
      males = males,
      females = females
    ) |>
      dplyr::bind_rows(.id = "sex")
    
    p1 <- ggplot(dat, aes(age, mort, linetype = sex)) +
      geom_hline(yintercept = 0) +
      geom_line() +
      scale_linetype_discrete(
        labels = c(males = "Men", females = "Women"),
        breaks = c("males", "females")
      ) +
      labs(x = "Age", y = "Hazards", linetype = NULL) +
      theme_bw() +
      theme(
        legend.position = c(0, 1),
        legend.justification = c(0, 1),
        legend.box.background = element_rect(color = "black")
      )
    
    p2 <- dat |> 
      select(sex, age, mort) |> 
      pivot_wider(names_from = sex, values_from = mort) |> 
      mutate(hazard = males / females) |> 
      ggplot(aes(age, hazard)) +
      geom_hline(yintercept = 1) +
      geom_line() +
      labs(x = "Age", y = "Hazard ratio", linetype = NULL) +
      theme_bw() +
      theme(
        legend.position = c(0, 1),
        legend.justification = c(0, 1),
        legend.box.background = element_rect(color = "black")
      )
    
    p1 + p2 +
      plot_annotation(
        title = "Female and male mortality by age, Sweden 2001-2020.",
        subtitle = "The right hand panel shows the ratio between the male and female mortality by age."
      )
    

    enter image description here