[SOLVED] How to reorganize data based on unique cases of a variable?

How to reorganize data based on unique cases of a variable?

I have a dataset that has course completions from a variety of subject areas for a very large number of students. Each row of data is organized as follows:

Student ID	Gender	City	Course
10102	Male	City 1	Science
10304	Female	City 2	Science
10304	Female	City 2	Math
10506	Male	City 3	English
10506	Male	City 3	French
10506	Male	City 3	Science

I would like to reorganize the dataset to have a single row of data for each unique student ID and then have all of the courses that student took added to a list variable for that student.

I've used R for this dataset as it's very large (~1.5 million rows) and it seems to be effective for that whereas Excel or SPSS are not.

I've used the filter() function for a few other things, but I don't think it'll work in this case because I don't want duplicates of each student ID. Just one that has each unique course code they have taken listed inside of it.

Solution

We can use dplyr here along with group_by:

library(dplyr)

df_out <- df %>%
    group_by(`Student ID`) %>%
    summarise(Course = list(Course), .groups = 'drop')

df_out

# A Tibble: 3 × 2
  `Student ID` Course   
         <dbl> <list>   
1        10102 <chr [1]>
2        10304 <chr [2]>
3        10506 <chr [3]>