routliersdesctools

r - within-subject winsorization


I have a long-format dataframe data.set, in which each subject has different numeric values (data.set$target_resp.rt) per conditions. I have already winsorized my data with respect to an overall criterion by using the DescTool function Winsorize (see here for info):

overall.criterion.2sd <- data.set$overall.mean+(2*data.set$overall.sd)
winsors.2 <- DescTools::Winsorize(data.set$target_resp.rt, maxval=overall.criterion.2sd[1])

Above, it was possible to define maxval as the first value of the variable overall.criterion.2sd, as it's the same values for all subjects. Now I would like to winsorize my data by subject, i.e. I'll need to run within-subject row-by-row winsorizisation. Here's my attempt, with criterion.2sd is just a vector of N values (N=no. of subjects):

criterion.2sd <- data.set$rt.mean+(2*data.set$rt.sd)
within.winsors.2 <- data.set %>% group_by(Nome, Cognome) %>%
                                    Winsorize(data.set$target_resp.rt, maxval=unique(criterion.2sd))

The following error pops up:

Error in [<-.data.frame(*tmp*, x < minval, value = c(1.35768795013, : 'value' is the wrong length

I understand that something is wrong the cardinality of the maxval variable, but I can't figure out how to fix it. Can anybody help?

Here's a sample of the dataset data.set (hopefully it's enough; let me know if it's the right format):

   subject        target_resp.rt   rt.mean     rt.sd
 1 1              1.0398901        0.9016781   0.3109358
 2 1              0.6887729        0.9016781   0.3109358
 3 1              0.7691720        0.9016781   0.3109358
 4 1              1.0064900        0.9016781   0.3109358
 5 1              0.8195999        0.9016781   0.3109358
 6 2              0.8410320        1.0500845   0.4210796
 7 2              0.8229311        1.0500845   0.4210796
 8 2              0.9250839        1.0500845   0.4210796
 9 2              1.0085750        1.0500845   0.4210796
10 2              1.1406291        1.0500845   0.4210796
11 3              0.5561039        0.749789    0.2350127
12 3              0.6022139        0.749789    0.2350127
13 3              0.8560688        0.749789    0.2350127
14 3              0.5886030        0.749789    0.2350127
15 3              0.5520449        0.749789    0.2350127

Solution

  • It's a problem with mixed up dplyr syntax. In the original question, you're passing a vector to Winsorize, but data.set %>% group_by(Nome, Cognome) is a dataset and the pipe (%>%) passes the whole dataset to the first argument of Winsorize, meaning you're really calling

    Winsorize(x = data.set, minval = ..., maxval = ...)
    

    What you really want is to use mutate after the group_by to change target_resp.rt; the syntax looks like:

    data.set %>% group_by(subject) %>%
      mutate(target_winsorized = Winsorize(target_resp.rt, maxval=unique(overall.criterion.2sd))
    

    That creates a new variable in the dataset target_winsorized with the properties you want. In the future you might also want to save the overall.criterion.2sd inside the dataset too.

    Docs

    Check out the dplyr docs if want to learn more about syntax and dplyr style.