rdataframelimits

R using limits to find data in data frame


I have a large data frame filled with numbers and a second data frame with limits (a high and low acceptable range) for each column. Im wondering how I can use the high and low limits to find data that falls outside of that range for each column. I can do this with a for loop, but it is a messy solution (and I'm sure inefficient) so I'm wondering if there is another way.

For example

#Create a data frame with values ranging from 0-10
sampleData <- data.frame(replicate(9,sample(0:10,10, rep=TRUE)))

  X1 X2 X3 X4 X5 X6 X7 X8 X9
1  1  7  9  0  7  3  0  0  8
2  4  8  3  4  9  6  3  2  3
3  9  7  5  2  7  5 10  9  4
4  2  6  2  1  3  9  4  3  9
5 10  2  2  6  4  7  4  9  7

#Have another data frame with our limits
  X1 X2 X3 X4 X5 X6 X7 X8 X9
1  1  7  3  4  7  3  0  0  3
2  4  8  9 10  9  6  3  2  8

I would want to know which rows have failed based on the values being outside our limits for that column. So failures would be

Col 1: 3,5
Col 2: 4,5
Col 3: 4,5
Col 4: 1,3,4
Col 5: 4,5
Col 6: 4,5
Col 7: 3,4,5
Col 8: 3,4,5
Col 9: 4

Thank you!


Solution

  • We can use base R mapply. Assuming your limits dataframe is called limits. We pass the columns parallely from both the dataframes and select the indices which extend the limits.

    mapply(function(x, y) which(x < y[1] | x > y[2]) , sampleData, limits)
    
    
    #$X1
    #[1] 3 5
    
    #$X2
    #[1] 4 5
    
    #$X3
    #[1] 4 5
    
    #$X4
    #[1] 1 3 4
    
    #$X5
    #[1] 4 5
    
    #$X6
    #[1] 4 5
    
    #$X7
    #[1] 3 4 5
    
    #$X8
    #[1] 3 4 5
    
    #$X9
    #[1] 4