rdateggplot2visualizationaspect-ratio

How to stretch an axis of type date in R?


I am fairly new to R and currently working on visualizing some health data (BPM, Intervals and so on). I have roughly 5 days worth of data. So far I had no problem plotting some graphs of the progress throughout the day using ggplot (see fig. 1). Here's my code for that:

twentyeight %>%
  ggplot(aes(x = Date28, y = as.numeric(BPM28), group = 1)) +
  geom_line() +
  scale_x_datetime(
    date_breaks = "1 hour",
    date_labels = "%d.%m.%Y %H:%M",
    name = "Date"
  ) +
  scale_y_continuous(name = "BPM")

Figure 1 As I have ~1 data entry per second and therefore thousands over a day, I am struggling to make the graph more visually pleasing. I'd like to try and stretch the x-axis a bit wider to include more finer details of the plot (even though I know this affects the ratio of the plot). I have tried using aspect-ratio like this:

+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5), aspect.ratio = 0.2)

but this only results in a smaller plot. I have also tried using coord_fixed(ratio = ...) but I'm getting an error of non-compatible types, ggplot and date. Is the only way choosing a smaller timeframe? Figure 2


Solution

  • Up front: The question is very much about data which is inherently more about the art and less about strict programming, and therefore prone to opinions. I offer this answer merely as a very short list of suggestions in data-vis, I don't think they are necessarily the best nor the only options.

    Here are some thoughts. For reference, I'm using my heartrate over 5 days as captured by my garmin watch. (I apologize for not sharing the data.) Most of the data is every second, but there are a few instances of HR=0 (though I'm still alive) and some gaps; I'm filtering out the former and keeping the latter for demonstration.

    str(quux)
    # tibble [5,349 × 2] (S3: tbl_df/tbl/data.frame)
    #  $ timestamp : POSIXct[1:5349], format: "2025-06-01 16:59:00" "2025-06-01 17:00:00" "2025-06-01 17:01:00" "2025-06-01 17:02:00" ...
    #  $ heart_rate: int [1:5349] 52 54 50 49 50 51 47 49 46 47 ...
    

    A quick plot without any effort:

    ggplot(quux, aes(timestamp, heart_rate)) +
      geom_line()
    

    basic HR plot over 5 days

    As mentioned, there are a few things we can try to pretty-up here. I'll add two forms of smoothing, one using a rolling-mean and plotting it literally, one using geom_smooth(). Some of the parameters are just "from the hip", feel free to find values that work well with your data.

    quux |>
      mutate(smooth_HR = zoo::rollapply(heart_rate, 20, FUN = mean, align = "center", fill = NA)) |>
      ggplot(aes(timestamp, heart_rate)) +
      geom_line(color = "gray80") +
      geom_line(aes(y = smooth_HR), na.rm = TRUE) +
      geom_smooth(method = "gam", formula = y ~ splines::ns(x, 25))
    

    same data, two types of smoothing

    There are a couple of gaps in the data (e.g., June 6), let's remove gaps 30 seconds or over:

    quux |>
      mutate(grp = consecutive_id(c(FALSE, diff(timestamp) > 30))) |>
      mutate(.by = grp, smooth_HR = zoo::rollapply(heart_rate, 20, FUN = mean, align = "center", fill = NA)) |>
      ggplot(aes(timestamp, heart_rate, group = grp)) +
      geom_line(color = "gray80") +
      geom_line(aes(y = smooth_HR), na.rm = TRUE) +
      geom_smooth(method = "gam", formula = y ~ splines::ns(x, 25))
    

    same data, with gaps in data no longer connected

    I can apply your theme for the x-axis, though (1) I think having the date in there may be a waste of screen real-estate, and (2) ignoring the fact that it will cause a visual compression vertically of the data, I don't think it adds a lot of value to breaking out values and/or trends. If we facet by day, this may help break it out. I'm also introducing alldays as a not-to-be-plotted geom so that the days with partial data will still be full days. I'm also changing the theme a bit, I think that this can take more attention depending on your preferences and your data.

    alldays <- tibble(date = seq(min(as.Date(quux$timestamp)), max(as.Date(quux$timestamp)), by = "1 day")) |>
      reframe(.by = date, timestamp = as.POSIXct(paste(date, c("00:00:00", "23:59:59")), tz = "UTC"),
              heart_rate = NA_real_)
    quux |>
      mutate(
        date = as.Date(timestamp),
        grp = consecutive_id(c(FALSE, diff(timestamp) > 30))
      ) |>
      mutate(.by = grp, smooth_HR = zoo::rollapply(heart_rate, 20, FUN = mean, align = "center", fill = NA)) |>
      ggplot(aes(timestamp, heart_rate, group = grp)) +
      facet_wrap(date ~ ., scales = "free_x", ncol = 1, strip.position = "left") +
      geom_line(color = "gray80") +
      geom_line(aes(y = smooth_HR), na.rm = TRUE) +
      geom_smooth(method = "gam", formula = y ~ splines::ns(x, 25)) +
      geom_blank(aes(group = NULL), data = alldays) +
      scale_x_datetime(name = NULL, date_labels = "%H:%M:%S") +
      scale_y_continuous(name = "BPM") +
      theme_bw() +
      theme(panel.grid.minor.y = element_blank())
    

    same data, faceted by date, themed up

    Some wrap-up notes:

    1. I'm not suggesting that showing both smoothed lines is relevant, pick what you want and discard the other.
    2. There are many smoothing techniques, including stats::smooth, I'm confident there may be more appropriate methods to match the variability of this data and/or to achieve the results you want.
    3. I think it can be useful to keep the unsmoothed (raw) data in the plot (background gray line), but it may be distracting, over to you.
    4. Perhaps I need some more exercise ... :-)