Counting start and end states with a pivot table in R

I have a table of data with studentIDs,TestingWindow, and BenchmarkCategories. Values for TestingWindow are "Start of Year" or "End of Year." Values for the TestingData are Urgent Intervention, Intervention, On Watch, At Benchmark, Above Benchmark.

Here's some sample data. Notice student 123 is missing a result of End of Year.

studentIDs	ScreeningPeriodWindowName <chr>	DistrictBenchmarkCategoryName <chr>
123	StartofYear	Urgent Intervention
456	StartofYear	Intervention
456	EndofYear	At Benchmark
789	StartofYear	On Watch
789	EndofYear	Above Benchmark

I want to display the movement of students between BenchmarkCategories between the start and end of year. I'd like a table to show this. The key is to show changes for individual students, rather than total counts. "How many students moved from Urgent Intervention to On Watch?"

Here's what the table should look like, with Column A the start of year categories and Row 1 the End of Year Categories. (middle cells blank but I think you see the pattern)

v start end>	Urgent Intervention	Intervention	On Watch	At Benchmark	Above Benchmark
Urgent Intervention	count of Urgent to Urgent	count of Urgent to Intervention	count of Urgent to On Watch	Count of Urgent to At Benchmark	count of Urgent to Above Benchmark
Intervention
On Watch
At Benchmark
Above Benchmark	count of Above to Urgent	count of Above to Intervention	count of Above to On Watch	count of Above to At Benchmark	count of Above to Above

In Excel pivot tables this is trivial (defining rows and columns, count IDs) but I am having a hard time wrapping my head around it in terms of grouping and summarizing (or pivot-tabling) in R. Thanks.

I have tried pivot_wider and pivot_longer, but each only gets me the row or column headings and one set of counts.

progress <- reading_data %>%
  select(studentID,TestingWindow,BenchmarkCategories) %>%
  group_by(TestingWindow,BenchmarkCategories) %>%
  summarize(count=n_distinct(studentID)) %>%
  pivot_wider(names_from=TestingWindow,values_from=count)

progress

Solution

You can do this using two pivot_wider() calls:

first pivot_wider() to create separate columns for each value in TestingWindow e.g. StartofYear and EndofYear
create counts per StartofYear/EndofYear pairing, then pivot_wider() again so EndofYear values are pivoted to columns

Note that if you want all BenchmarkCategories to be included in the result, you'll need an extra step as detailed below. I have also included a larger example dataset to illustrate results.

First, with your sample data (with modified column names to match your code pipe column names):

library(dplyr)
library(tidyr)

reading_data <- structure(list(studentID = c(123L, 456L, 456L, 789L, 789L), TestingWindow = c("StartofYear", 
"StartofYear", "EndofYear", "StartofYear", "EndofYear"), BenchmarkCategories = c("Urgent Intervention", 
"Intervention", "At Benchmark", "On Watch", "Above Benchmark"
)), class = "data.frame", row.names = c(NA, -5L))

progress <- reading_data |>
  pivot_wider(names_from = TestingWindow,
              values_from = BenchmarkCategories) |>
  count(StartofYear, EndofYear, .drop = FALSE) |>
  pivot_wider(names_from = EndofYear,
              values_from = n,
              values_fill = 0) |>
  rename(`v start end >` = StartofYear)

progress
# # A tibble: 3 × 4
#   `v start end >`     `At Benchmark` `Above Benchmark`  `NA`
#   <chr>                        <int>             <int> <int>
# 1 Intervention                     1                 0     0
# 2 On Watch                         0                 1     0
# 3 Urgent Intervention              0                 0     1

Pairs not represented are dropped so not all BenchmarkCategories are present. Also, the NA column counts the incomplete records e.g. studentID == 123.

If you want all BenchmarkCategories in the result:

# Create vector of all unique BenchmarkCategories
bmc <- unique(reading_data$BenchmarkCategories)

progress <- reading_data |>
  pivot_wider(names_from = TestingWindow,
              values_from = BenchmarkCategories) |>
  count(StartofYear, EndofYear, .drop = FALSE) |>
  complete(StartofYear = bmc,
           EndofYear = bmc,
           fill = list(n = 0)) |>
  pivot_wider(names_from = EndofYear,
              values_from = n,
              values_fill = 0) |>
  rename(`v start end >` = StartofYear)
  

progress
# # A tibble: 5 × 7
#   StartofYear         `Above Benchmark` `At Benchmark` Intervention `On Watch` `Urgent Intervention`  `NA`
#   <chr>                           <int>          <int>        <int>      <int>                 <int> <int>
# 1 Above Benchmark                     0              0            0          0                     0     0
# 2 At Benchmark                        0              0            0          0                     0     0
# 3 Intervention                        0              1            0          0                     0     0
# 4 On Watch                            1              0            0          0                     0     0
# 5 Urgent Intervention                 0              0            0          0                     0     1

Example using larger example dataset:

set.seed(42)
reading_data <- data.frame(
  studentID = c(123, rep(456:789, each = 2)),
  TestingWindow = c("StartofYear", rep(c("StartofYear", "EndofYear"), 334)),
  BenchmarkCategories = sample(c("Urgent Intervention","Intervention",
                                 "At Benchmark", "On Watch", "Above Benchmark"),
                               669, replace = TRUE))

progress <- reading_data |>
  pivot_wider(names_from = TestingWindow,
              values_from = BenchmarkCategories) |>
  count(StartofYear, EndofYear, .drop = FALSE) |>
  # complete(StartofYear = bmc,
  #          EndofYear = bmc,
  #          fill = list(n = 0)) |>
  pivot_wider(names_from = EndofYear,
              values_from = n,
              values_fill = 0) |>
  rename(`v start end >` = StartofYear)

progress
# # A tibble: 5 × 7
#   `v start end >`     `Above Benchmark` `At Benchmark` Intervention `On Watch` `Urgent Intervention`  `NA`
#   <chr>                           <int>          <int>        <int>      <int>                 <int> <int>
# 1 Above Benchmark                    12              8           13         13                     8     0
# 2 At Benchmark                       14              9           11         12                    15     0
# 3 Intervention                       18             17           18         15                    20     0
# 4 On Watch                           14             10           14          7                    15     0
# 5 Urgent Intervention                11             21           14         17                     8     1