rloopsiteratoriterationtidyverse

Using tidyverse to loop or iterate my dataframe to replace all values before the a first specific value and after a specific value


I'm working with psychological tests composed of 10 gradually increasing items. Scores can be 0 (wrong), 1 (almost correct), or 2 (correct answer). So, question 2 is harder than question 1. Question 3 is harder than question 2 and 1, etc. People don't reply to all questions but just to some questions (items). The score system should do this:

-- All columns before the first column with the value equals "2" should be replaced with "2". [We consider that the participant would correctly answer this question] -- All columns after the last column with the value equals "0" should be replaced with "0". [We consider that the participant would fail this question]

I tried all solutions but I did not have success on this. I would like to stay with tidyverse, using mutate if possible

Dataframe is

df = structure(list(x1_2 = c(NA, 2, NA, NA, NA, NA), x2_2 = c(NA, 2, 
2, 2, 2, 2), x3_2 = c(2, 2, 1, 1, 2, 2), x4_2 = c(2, 1, 0, 2, 
0, 0), x5_2 = c(2, 1, NA, 2, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))
df

Desidered dataframe is:

df_final = structure(list(x1_2 = c(2, 2, 2, 2, 2, 2), x2_2 = c(2, 2, 
    2, 2, 2, 2), x3_2 = c(2, 2, 1, 1, 2, 2), x4_2 = c(2, 1, 0, 2, 
    0, 0), x5_2 = c(2, 1, 0, 2, 0, 0)), row.names = c(NA, -6L), class = c("tbl_df", 
    "tbl", "data.frame"))
df_final

dataframe

For all first values, I could use this code:

initialNA <- function(x) {
  index <- cumsum(is.na(x)) >= seq_along(x)
  x[index] <- 2
  x
}

df <- data.frame(t(apply(df, 1, initialNA)))
df

Solution

  • You could modify your function slightly as follows:

    1. Include a second chunk of code to impute the trailing NA, similar to your existing code, but simply reversing the vectors.

    2. Replace x[index] <- 2 with x[index] <- max(x, na.rm=TRUE) since an individual may get a question only "almost correct".

    The final function looks like this:

    imputeNA <- function(x) {
      index1 <- cumsum(is.na(x)) >= seq_along(x)
      x[index1] <- max(x, na.rm=TRUE)
    
      index2 <- rev(cumsum(rev(is.na(x))) >= seq_along(x))
      x[index2] <- min(x, na.rm=TRUE)
      x
    }
    

    Now, modify your toy data slightly so that the first guy has all "ones" instead "twos" in order to better test the function.

    df[1,] <- data.frame(NA, NA, 1, 1, 1)
    df
    
       x1_2  x2_2  x3_2  x4_2  x5_2
      <dbl> <dbl> <dbl> <dbl> <dbl>
    1    NA    NA     1     1     1
    2     2     2     2     1     1
    3    NA     2     1     0    NA
    4    NA     2     1     2     2
    5    NA     2     2     0    NA
    6    NA     2     2     0    NA
    

    Test it:

    data.frame(t(apply(df, 1, imputeNA)))
    
      x1_2 x2_2 x3_2 x4_2 x5_2
    1    1    1    1    1    1
    2    2    2    2    1    1
    3    2    2    1    0    0
    4    2    2    1    2    2
    5    2    2    2    0    0
    6    2    2    2    0    0