rggplot2pivotgeom

Geom_line and geom_point issues to distinguish variations over time from pivot_longer data in R


I have a biomarker for which I would like to model the relative variations (var, %) of concentrations (in g/L) over 7 successive times ('Ct0' to 'Ct6'), using geom_line and geom_point.

The basic data are these:

> dat0
# A tibble: 10 × 8
      id   Ct0   Ct1   Ct2   Ct3   Ct4   Ct5   Ct6
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     9     6    NA    NA    NA    NA    NA
 2     2     6     8    12    NA    NA    NA    NA
 3     3    19    21    38    18    16     6    14
 4     4    36    27    25    12    13    21    24
 5     5    11    17    NA    NA    NA    NA    NA
 6     6    29    14    16     7    NA    NA    NA
 7     7     4    12    18    35    31    NA    NA
 8     8    32    30    29    NA    NA    NA    NA
 9     9    39    37    35    40    37    39    NA
10    10     2     3    21    22    NA    NA    NA  

The criteria to take into account are these:

  1. var are to be calculated relative to the previous Ct, not relative to the initial concentration Ct0 (except for the first variation Ct1 vs Ct0 of course),
  2. when 2 successive Ct are <= cut-off at 15 g/L, whatever var: the geom_line segment and the 2 geom_point are green,
  3. when one of the 2 successive Ct is >15 g/L and var is >=50% down or up, the geom_line segment and the 2 geom_point are red,
  4. when the 2 successive Ct are >15 g/L and var is >=20% down or up, the geom_line segment and the 2 geom_point are red,
  5. for the geom_points, red takes priority over green (i.e. if the previous var is green and the next var is red, or if the previous var is red and the next var is green, the geom_point at the intersection of the 2 geom_line segments is red in both cases),
  6. the id cases with at least one red var over the period are shown in front of the graph (i.e. not hidden by the green geom_line at the intersection points), and for geom_line segments or geom_points of distinct cases that may overlap, red always takes priority over green for both.

Here is my approach:

First, pivot_longer dat0:

dat1 <- dat0 |>
  pivot_longer(
  cols = c(2:8),
  names_to = "time",
  names_prefix = "Ct",
  values_to = "conc",
  values_drop_na = TRUE
  )
dat1$id <- as.factor(dat1$id)
dat1$time <- as.factor(dat1$time)

Second, mutate var:

dat1 <- dat1 |> 
  arrange(id, time) |>
  group_by(id) |>
  mutate(var = 100 * (conc - lag(conc)) / lag(conc)) |>
  ungroup()
dat1$var <- round(dat1$var, 1)

Third, mutate grcol (green=0 or red=1 color):

dat1 <- dat1 |> 
  arrange(id, time) |>
  group_by(id) |>
  mutate(
    grcol =
      case_when(
        !is.na(var) & lag(conc) <= 15 & conc <= 15 ~ 0,
        !is.na(var) & lag(conc) <= 15 & conc > 15 & abs(var) < 50 ~ 0,
        !is.na(var) & lag(conc) <= 15 & conc > 15 & abs(var) >= 50 ~ 1,
        !is.na(var) & lag(conc) > 15 & conc <= 15 & abs(var) < 50 ~ 0,
        !is.na(var) & lag(conc) > 15 & conc <= 15 & abs(var) >= 50 ~ 1,
        !is.na(var) & lag(conc) > 15 & conc > 15 & abs(var) < 20 ~ 0,
        !is.na(var) & lag(conc) > 15 & conc > 15 & abs(var) >= 20 ~ 1,
        TRUE ~ NA)) |> 
  ungroup()
dat1$grcol <- as.factor(dat1$grcol)  

Then plot:

grcol_color <- c("green", "red")
ggplot(dat1, aes(time, conc, group = id, colour = lead(grcol))) + 
  scale_colour_manual(values = grcol_color) +
  geom_hline(yintercept =  15) + # cut-off
  geom_line(linewidth = 1.5) +
  geom_point(size = 4)

Note that because of the var calculation based on lag in dat1, I use lead for the colour argument, otherwise the first segments are gray (because of the NAs at Ct0 in dat1) and the following ones do not have the expected colors (because they are shifted).

enter image description here

The graph is close to the goal; however, three problems persist:

  1. the last points are gray (because of colour = lead(grcol) a priori) whereas they should be the same color as the last segment,
  2. some red segments are hidden behind the green ones at the intersections whereas they should be in front.
  3. some red points are hidden behind the green ones whereas they should be in front.

How to solve these problems? Is the pivot_longer approach the most appropriate or is there a simpler way?

Thanks for help or advice, and for time

Data:

dat0 <-
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Ct0 = c(9, 
6, 19, 36, 11, 29, 4, 32, 39, 2), Ct1 = c(6, 8, 21, 27, 17, 14, 
12, 30, 37, 3), Ct2 = c(NA, 12, 38, 25, NA, 16, 18, 29, 35, 21
), Ct3 = c(NA, NA, 18, 12, NA, 7, 35, NA, 40, 22), Ct4 = c(NA, 
NA, 16, 13, NA, NA, 31, NA, 37, NA), Ct5 = c(NA, NA, 6, 21, NA, 
NA, NA, NA, 39, NA), Ct6 = c(NA, NA, 14, 24, NA, NA, NA, NA, 
NA, NA)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-10L))

Solution

  • I would focus here on reshaping your data frame such that each row represents one of the segments to be plotted. It is then easy to apply your rules to that segment. Draw two layers of points - one for the start points and one for the end points. Finally, draw those two points layers again but with only the "red" rows.

    dat0 %>%
      pivot_longer(-1, names_pattern = "Ct(.*)", names_to = "start",
                   values_to = "start_value") %>%
      mutate(start = as.numeric(start)) %>% 
      mutate(end = lead(start, 1), end_value = lead(start_value, 1), .by = "id") %>%
      filter(complete.cases(.)) %>%
      mutate(change = abs(1 - (end_value / start_value) )) %>%
      mutate(either = start_value > 15 | end_value > 15,
             both = start_value > 15 & end_value > 15) %>%
      mutate(color = ifelse((either & (change >= 0.5)) |
                            (both & (change >= 0.2)), "red", "green3")) %>%
      ggplot(aes(x = start, y = start_value, color = color)) +
      geom_segment(aes(xend = end, yend = end_value), linewidth = 1.5) +
      geom_point(size = 4) +
      geom_point(aes(x = end, y = end_value), size = 4) +
      geom_point(data = . %>% filter(color == "red"), size = 4) +
      geom_point(aes(x = end, y = end_value), data = . %>% filter(color == "red"),
                 size = 4) +
      scale_color_identity() +
      labs(x = "time", y = "conc") +
      theme_minimal(20)
    

    enter image description here