I'm working with psychological tests composed of 10 gradually increasing items. Scores can be 0 (wrong), 1 (almost correct), or 2 (correct answer). So, question 2 is harder than question 1. Question 3 is harder than question 2 and 1, etc. People don't reply to all questions but just to some questions (items). The score system should do this:
-- All columns before the first column with the value equals "2" should be replaced with "2". [We consider that the participant would correctly answer this question] -- All columns after the last column with the value equals "0" should be replaced with "0". [We consider that the participant would fail this question]
I tried all solutions but I did not have success on this. I would like to stay with tidyverse, using mutate if possible
Dataframe is
df = structure(list(x1_2 = c(NA, 2, NA, NA, NA, NA), x2_2 = c(NA, 2,
2, 2, 2, 2), x3_2 = c(2, 2, 1, 1, 2, 2), x4_2 = c(2, 1, 0, 2,
0, 0), x5_2 = c(2, 1, NA, 2, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
df
Desidered dataframe is:
df_final = structure(list(x1_2 = c(2, 2, 2, 2, 2, 2), x2_2 = c(2, 2,
2, 2, 2, 2), x3_2 = c(2, 2, 1, 1, 2, 2), x4_2 = c(2, 1, 0, 2,
0, 0), x5_2 = c(2, 1, 0, 2, 0, 0)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
df_final
For all first values, I could use this code:
initialNA <- function(x) {
index <- cumsum(is.na(x)) >= seq_along(x)
x[index] <- 2
x
}
df <- data.frame(t(apply(df, 1, initialNA)))
df
You could modify your function slightly as follows:
Include a second chunk of code to impute the trailing NA, similar to your existing code, but simply reversing the vectors.
Replace x[index] <- 2
with x[index] <- max(x, na.rm=TRUE)
since an individual may get a question only "almost correct".
The final function looks like this:
imputeNA <- function(x) {
index1 <- cumsum(is.na(x)) >= seq_along(x)
x[index1] <- max(x, na.rm=TRUE)
index2 <- rev(cumsum(rev(is.na(x))) >= seq_along(x))
x[index2] <- min(x, na.rm=TRUE)
x
}
Now, modify your toy data slightly so that the first guy has all "ones" instead "twos" in order to better test the function.
df[1,] <- data.frame(NA, NA, 1, 1, 1)
df
x1_2 x2_2 x3_2 x4_2 x5_2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 NA NA 1 1 1
2 2 2 2 1 1
3 NA 2 1 0 NA
4 NA 2 1 2 2
5 NA 2 2 0 NA
6 NA 2 2 0 NA
Test it:
data.frame(t(apply(df, 1, imputeNA)))
x1_2 x2_2 x3_2 x4_2 x5_2
1 1 1 1 1 1
2 2 2 2 1 1
3 2 2 1 0 0
4 2 2 1 2 2
5 2 2 2 0 0
6 2 2 2 0 0