Working with some event data from a dataset that has multiple different courses across multiple semesters. I need to find out how many students "logged" into the course each week and then find the percentage of the course that logged in.
Below is some sample code:
events <- data.frame(student_id= c(101, 101, 101,
102, 102, 102,
155, 155, 155,
101, 101),
event_date=as.Date(c("11/09/2000","11/10/2000","11/12/2000",
"11/09/2000","11/10/2000", "11/12/2000",
"11/09/2001","11/14/2001","11/15/2001",
"11/09/2001","11/15/2001"), "%m/%d/%Y"),
course_id=c(111,111,122,
111,111,111,
122,122,122,
111,111),
term=c("Fall 2000","Fall 2000", "Fall 2000",
"Fall 2000","Fall 2000", "Fall 2000",
"Fall 2001","Fall 2001", "Fall 2001",
"Fall 2001","Fall 2001"))
Calculation for daily events:
daily_events <- events %>%
mutate(daily_event_count = ymd(event_date)) %>%
group_by(course_id, term, week=week(event_date)) %>%
reframe(total_events = n_distinct(event_date),
stud_event_count = n_distinct(student_id))
Now I want to find out what percentage of the course "logged in" on any given week. I did some hard coding to get it to run but I know there is a better way to do this, right? This is okay since my sample data has only two courses, but in reality I have many.
Below is the mutate command I added with an ifelse statement:
daily_events <- events %>%
mmutate(daily_event_count = ymd(event_date)) %>%
group_by(course_id, term, week=week(event_date)) %>%
reframe(total_events = n_distinct(event_date),
stud_event_count = n_distinct(student_id)) %>%
mutate(stud_pct = ifelse(course_id==111 & term=='Fall 2000', (stud_event_count/2)*100,
ifelse(course_id==122 & term=='Fall 2000', (stud_event_count/1)*100,0)))
I calculated the total number of distinct students in another query, i.e. where the 2 and 1 are coming from.
stud_distr <- events %>%
group_by(term, course_id) %>%
reframe(stud_count = n_distinct(student_id))
How can I get the percentage when I have a lot more data and an ifelse or case/when seems inefficient?
Something like this:
daily_events_2 <- events %>%
mutate(week = week(event_date)) %>%
mutate(stud_count = n_distinct(student_id),
.by = c(term, course_id)) %>%
summarise(total_events = n_distinct(event_date),
stud_event_count = n_distinct(student_id),
.by = c(course_id, term, week, stud_count)) %>%
mutate(stud_pct = stud_event_count / stud_count * 100)
I rarely use reframe
. Instead use summarise
if I need to combine the data into a single row, or mutate
to return the same number of rows.
The code runs in 3 steps: (1) compute the total number of distinct students for each term and course without changing the other data, (2) compute the counts and (3) compute the percents.