I have a data set of users who attempt a set of math problems. They are randomly given a hint(s) to solve the problem (hint.flag). I want to find out how soon a user gets the "correct" answer (Correctness) for each problem after seeing the FIRST hint within a problem. To begin with I would need to create some kind of index to identify which post hint attempt they are on.
data.input <- read_table2("user.id problem.id correctness hint.flag
540995 A FALSE 1
540995 A FALSE 0
540995 A FALSE 1
540995 B FALSE 1
540995 B FALSE 1
540995 B TRUE 0
540995 C FALSE 1
540995 C FALSE 1
540995 C TRUE 0
540995 D TRUE 0
565662 A TRUE 0
565662 A FALSE 1
565662 A FALSE 0
565662 A FALSE 1
565662 A FALSE 1
565662 A FALSE 0
565662 A FALSE 0
565662 A FALSE 0")
The column I am trying to impute is "post.hint.attempt" (see desired.output).
desired.output <- read_table2("user.id problem.id correctness hint.flag post.hint.attempt
540995 A FALSE 1
540995 A FALSE 0 first
540995 A FALSE 1 second
540995 B FALSE 1
540995 B FALSE 1 first
540995 B TRUE 0 second
540995 C FALSE 1
540995 C FALSE 1 first
540995 C TRUE 0 second
540995 D TRUE 0
565662 A TRUE 0
565662 A FALSE 1
565662 A FALSE 0 first
565662 A FALSE 1 second
565662 A FALSE 1 third
565662 A FALSE 0 fourth
565662 A FALSE 0 fifth
565662 A FALSE 0 sixth
")
I have no idea how to even begin finding this... I think row_id might be useful, but the need to lag the attempt number by the occurrence of a hint is confusing to me. Any help would be amazing!!!
Just in case, here is another approach with tidyverse
. It involves checking the row_number
within a given problem, and add a counter value if it comes after the first hint. A cumulative sum of this counter is then added.
(Note you will get a warning if a problem does not have any hints.)
library(tidyverse)
data.input %>%
group_by(user.id, problem.id) %>%
mutate(count = ifelse(row_number() > min(which(hint.flag == 1)), 1, 0),
post.hint.attempt = cumsum(count)) %>%
select(-count)
Output
# A tibble: 18 x 5
# Groups: user.id, problem.id [5]
user.id problem.id correctness hint.flag post.hint.attempt
<dbl> <chr> <lgl> <dbl> <dbl>
1 540995 A FALSE 1 0
2 540995 A FALSE 0 1
3 540995 A FALSE 1 2
4 540995 B FALSE 1 0
5 540995 B FALSE 1 1
6 540995 B TRUE 0 2
7 540995 C FALSE 1 0
8 540995 C FALSE 1 1
9 540995 C TRUE 0 2
10 540995 D TRUE 0 0
11 565662 A TRUE 0 0
12 565662 A FALSE 1 0
13 565662 A FALSE 0 1
14 565662 A FALSE 1 2
15 565662 A FALSE 1 3
16 565662 A FALSE 0 4
17 565662 A FALSE 0 5
18 565662 A FALSE 0 6