I have done a longitudinal study on children's letter knowledge development. In this study, 120 children were given a letter knowledge task, at three different time points.
The first five rows of df
> head(df)
id T1_date T1_letters T2_date T2_letters T3_date T3_letters
1 101 2022-10-17 4 2023-05-15 18 2023-12-11 26
2 102 2022-10-18 9 2023-05-15 20 2023-12-11 30
3 103 2022-10-17 14 2023-05-15 30 2023-12-11 30
4 104 2022-10-18 7 2023-05-15 17 2023-12-11 27
5 105 2022-10-17 1 2023-05-16 11 2023-12-12 26
6 106 2022-10-17 2 2023-05-15 11 2023-12-12 26
The first column ("id") in this dataset shows the participant ID numbers. As you can see, all of the time points (i.e., sessions) were on slightly different days. So, one child may have a different date for T3 compared to another child.
Now, I would like to plot the children's letter knowledge scores over time. Every child should be visualized as a single line (so: 120 thin lines, connecting the three scores for each child). The x-axis of this plot should represent the date of each session (with each session having a slightly different date), and the y-axis should represent the letter knowledge scores (these are on a scale from 0 to 34, because we also used digraphs).
At the moment, I don't need different colors for each participant, so it would be fine if all lines are in black. (However, it would be nice to have the option to change the colors for specific subjects later)
Data:
df = structure(list(id = c(101, 102, 103, 104, 105, 106, 201, 202,
203, 204, 205), T1_date = c("2022-10-17", "2022-10-18", "2022-10-17",
"2022-10-18", "2022-10-17", "2022-10-17", "2022-12-01", "2022-12-01",
"2022-12-01", "2022-11-23", "2022-11-23"), T1_letters = c(4,
9, 14, 7, 1, 2, 3, 8, 0, 3, 8), T2_date = c("2023-05-15", "2023-05-15",
"2023-05-15", "2023-05-15", "2023-05-16", "2023-05-15", "2023-03-28",
"2023-03-28", "2023-03-29", "2023-03-27", "2023-03-27"), T2_letters = c(18,
20, 30, 17, 11, 11, 4, 14, 4, 8, 8), T3_date = c("2023-12-11",
"2023-12-11", "2023-12-11", "2023-12-11", "2023-12-12", "2023-12-12",
"2023-09-21", "2023-09-21", "2023-09-21", "2023-09-18", "2023-09-18"
), T3_letters = c(26, 30, 30, 27, 26, 26, 10, 18, 8, 16, 18)), row.names = c(NA,
-11L), class = "data.frame")
You’ll need to reshape your data into a “long” format with a row for each session. This can be straightforwardly achieved with the
pivot_longer()
function from the package tidyr.
library(tidyverse)
long <- df |>
pivot_longer(
cols = !id,
names_to = c("session", ".value"),
names_sep = "_"
) |>
mutate(date = as.Date(date))
long
#> # A tibble: 33 × 4
#> id session date letters
#> <dbl> <chr> <date> <dbl>
#> 1 101 T1 2022-10-17 4
#> 2 101 T2 2023-05-15 18
#> 3 101 T3 2023-12-11 26
#> 4 102 T1 2022-10-18 9
#> 5 102 T2 2023-05-15 20
#> 6 102 T3 2023-12-11 30
#> 7 103 T1 2022-10-17 14
#> 8 103 T2 2023-05-15 30
#> 9 103 T3 2023-12-11 30
#> 10 104 T1 2022-10-18 7
#> # ℹ 23 more rows
With the data in long form, you can create the line graph with ggplot2,
specifying the participant id
as the group to get individual lines.
ggplot(long, aes(date, letters, group = id)) + geom_line()