rgroup-bypercentage

Calculate total percentage with total N and events in two data frames in R


Working with some event data from a dataset that has multiple different courses across multiple semesters. I need to find out how many students "logged" into the course each week and then find the percentage of the course that logged in.

Below is some sample code:

events <- data.frame(student_id= c(101, 101, 101, 
                         102, 102, 102,
                         155, 155, 155,
                         101, 101), 
             event_date=as.Date(c("11/09/2000","11/10/2000","11/12/2000",
                            "11/09/2000","11/10/2000", "11/12/2000",
                            "11/09/2001","11/14/2001","11/15/2001",
                            "11/09/2001","11/15/2001"), "%m/%d/%Y"), 
             course_id=c(111,111,122,
                         111,111,111,
                         122,122,122,
                         111,111),
             term=c("Fall 2000","Fall 2000", "Fall 2000",
                    "Fall 2000","Fall 2000", "Fall 2000",
                    "Fall 2001","Fall 2001", "Fall 2001",
                    "Fall 2001","Fall 2001"))

Calculation for daily events:

daily_events <- events %>%
  mutate(daily_event_count = ymd(event_date)) %>%
  group_by(course_id, term, week=week(event_date)) %>%
  reframe(total_events = n_distinct(event_date),
            stud_event_count = n_distinct(student_id)) 

Now I want to find out what percentage of the course "logged in" on any given week. I did some hard coding to get it to run but I know there is a better way to do this, right? This is okay since my sample data has only two courses, but in reality I have many.

Below is the mutate command I added with an ifelse statement:

daily_events <- events %>%
  mmutate(daily_event_count = ymd(event_date)) %>%
  group_by(course_id, term, week=week(event_date)) %>%
  reframe(total_events = n_distinct(event_date),
            stud_event_count = n_distinct(student_id)) %>% 
  mutate(stud_pct = ifelse(course_id==111 & term=='Fall 2000', (stud_event_count/2)*100,
                           ifelse(course_id==122 & term=='Fall 2000', (stud_event_count/1)*100,0)))

I calculated the total number of distinct students in another query, i.e. where the 2 and 1 are coming from.

stud_distr <- events %>%
  group_by(term, course_id) %>%
  reframe(stud_count = n_distinct(student_id))

How can I get the percentage when I have a lot more data and an ifelse or case/when seems inefficient?


Solution

  • Something like this:

    daily_events_2 <- events %>% 
      mutate(week = week(event_date)) %>% 
      mutate(stud_count = n_distinct(student_id), 
             .by = c(term, course_id)) %>% 
      summarise(total_events = n_distinct(event_date),
                stud_event_count = n_distinct(student_id),
                .by = c(course_id, term, week, stud_count)) %>% 
      mutate(stud_pct = stud_event_count / stud_count * 100)
    

    I rarely use reframe. Instead use summarise if I need to combine the data into a single row, or mutate to return the same number of rows.

    The code runs in 3 steps: (1) compute the total number of distinct students for each term and course without changing the other data, (2) compute the counts and (3) compute the percents.