rloopsdataframecap

R Loop through Columns


How to loop through columns in a Data Frame and cap the values at 97.5th percentile of that column?

Eg. if one particular column has values 1 to 100 filled in it, the value >97.5, i.e 98, 99 and 100 should all be given 97.5.

Please see, I want to do this for columns 4 to last in the data frame.


Solution

  • You can do this in one line in base R

    #set up the data
    df <- data.frame(a = sample(100,replace=TRUE), 
                     b = sample(100,replace=TRUE),
                     c = sample(100,replace=TRUE))
    
    df2 <- as.data.frame(lapply(df, function(x) pmin(x, quantile(x, 0.975))))
    

    To just modify columns 4 to 10 (for example) of your dataframe, you could do

    data[,4:10] <- as.data.frame(lapply(data[,4:10], function(x) pmin(x, quantile(x, 0.975))))