rrangeminrowwise

Calculate rowwise Min/Max across a variable range of columns in R


For a data analysis in R I am trying to calculate a variable A1, which is the minimum value across a range of values. The tricky thing is that the start of the range varies depending on the index of a previous variable D1 (which is the max value across preceding columns).

Example:

df <- data.frame(ID = 1:5, V1 = c(2, 5, 2, 8, 3), V2 = c(3, 4, 4, 7, 1), V3 = c(7, 2, 8, 1, 5), V4 = c( 1, 2,3, 4, 6), V5 = c(3, 2, 5, 2, 8))
df

D1_range <- 2:3
df$D1 <- apply(df[,D1_range],1, max)
df$indexD1 <- apply(df[,D1_range], 1,which.max)
df

D1 is the max value for V1:V2. The range for A1 starts at indexD1 + 1. So, for example, for ID=5, this would start at V2, whereas for ID=1, this would start at V3.

Now I tried to indicate the range for A1 in a number of different ways. For example by calculating a range:

df$A1_start <- df$indexD1+1
df$A1_end <- 6
df
df$A1 <- df %>% rowwise() %>% do.call(pmin, df[,df$A1_start:df$A1_end])

Or by using apply

df$A1 <- apply(df[,df$A1_start:6], min)
df
df$A1 <- df %>% rowwise() %>% apply(df[,df$A1_start:6], min)
df

and with mutate:

df <- df %>% rowwise() %>% mutate(A1 = min(c_across(A1_range)))
df

I also tried to write the range as a string:


df$A1_range <- "{df$A1_start}:{df$A1_end}"

But this will just create a very weird variable with the text "{df$A1_start}:{df$A1_end}"

I also found another post that used subset, and tried that in a pipe, but I get an error if I do that:

df <- df %>% rowwise() %>% mutate(A1test = min(subset(., select = A1_startname:A1_endname)))

(note: in my real data I calculated A1_startname and A1_endname, which are the column names as strings instead of indexes, as well)

The problem is: even when I can get the code to calculate a value A1, it will take the value of A1_start of the first in the list (ID=1) as the start of the range for every row. However, in some cases this will be incorrect. For example for ID=5, D1 is the value in V1, so the range of A1 should start with V2, but now it starts with V3.

Can someone help me find a way to use a variable range inside a function that finds the minimum? Thanks!

Edited to include desired output:

If the function works, it should look something like this:

df <- data.frame(ID = 1:5, V1 = c(2, 5, 2, 8, 3), V2 = c(3, 4, 4, 7, 1), V3 = c(7, 2, 8, 1, 5), V4 = c( 1, 2,3, 4, 6), V5 = c(3, 2, 5, 2, 8), D1 = c(3, 5,4,8,3), D1index = c(1,1,2,2,1), A1start= c(3,2,3,3,2), A1 = c(1, 2, 3, 1,1))
df

If the range for A1 does not change according to the row (so if it takes the value A1start[1] as the start of the range for /all/ rows in the data frame), then you will get an incorrect A1 in ID=5, because in the range 3:5 the smallest value would be 5, but the actual correct value for A1 should be 1 in that row (because the range starts at V2 for this row).

Hope this helps. :)

Note: I just created a very simple data frame to illustrate, but the real are not integers and have 6 digits/decimals. So for the real data I think we can safely assume that there will be no duplicate values anywhere.

Note2: I added D1index and A1start to the dataframe as in-between steps. However, if A1 can be calculated without these 2 variables, that would also be fine. So the desired output could also possibly just be:

df <- data.frame(ID = 1:5, V1 = c(2, 5, 2, 8, 3), V2 = c(3, 4, 4, 7, 1), V3 = c(7, 2, 8, 1, 5), V4 = c( 1, 2,3, 4, 6), V5 = c(3, 2, 5, 2, 8), D1 = c(3, 5,4,8,3), A1 = c(1, 2, 3, 1,1))
df


Solution

  • So after some feedback, I found a solution by creating a for-loop that goes through the data rowwise. Like this:

    df <- df %>% mutate(A1_start = indexD1 +1, A1_end = 5)
    
    df$A1 <- NA
    for (i in 1:nrow(df)){
      A1_range <- df$A1_start[i]:5
      df$A1 <- apply(df[,A1_range], 1, min)
    

    Still, I would be interested to know if there are other solutions!