rdesctools

R: trimming a variable and adding it to a dataframe


I am an R beginner. I would like to trim a variable using the Trim function of the package "DescTools". This works fine with:

mydata <- data.frame(
 a <- rnorm(40, mean = 0, sd = 1)
 )
a_trim <- Trim(mydata$a, trim = 0.2, na.rm = TRUE)

This creates an object, however, I would like to add it to my dataframe mydata. When I try to do this by

mydata$a_trim <- Trim(mydata$a, trim = 0.2, na.rm = TRUE)

R gives me an error because mydata$a_trim has fewer rows than the dataframe (obviously, since it is a trimmed variable). How can I do this?

Thanks for your patience and help!


Solution

  • Trim isn't suitable for what you want to do. It removes extreme values from a vector so that you can pass that vector to something like mean or sd so that those quantities can be computed without the influence of outliers.

    To set extreme values to NA you can use quantile.

    upper_quantile <- quantile(mydata$a, 0.9)
    lower_quantile <- quantile(mydata$a, 0.1)
    
    # col a     where a > its 90th percentile    becomes NA
    mydata$a[mydata$a > upper_quantile] <- NA
    mydata$a[mydata$a < lower_quantile] <- NA