rdplyr

pmax (pmin) na.rm not working - Problem with NA/NaN Argument


I am dealing with a problem related to the pmax (and the pmin) function. I know it is used to get rowwise max (min) values. And what I wanted to get are the max and min values so to set some columns to a new range that is balanced for my analysis. The a:g is the old range and e:g is the new range that I want to use and balance out. My dataframe is not exactly the same (quite big actually), but for the sake of getting to the point, lets say the df is like this:

# Example df

ind <- c("A","B","C")
y <- c(2008,2012,2016,2020)
indiv <- rep(ind, times=4)
year <- rep(y, times=3)

a <- runif(n=12, min=0, max=100)
b <- runif(n=12, min=0, max=100)
c <- runif(n=12, min=0, max=100)
d <- runif(n=12, min=0, max=100)
e <- runif(n=12, min=0, max=100)
f <- runif(n=12, min=0, max=100)
g <- runif(n=12, min=0, max=100)

df_data <- data.frame(indiv,year,a,b,c,d,e,f,g)

# Code for max min and new range

newdf <- df_data %>% 
  mutate(Oldmax = pmax(a:g,na.rm=TRUE)) %>% 
  mutate(Oldmin = pmin(a:g,na.rm=TRUE)) %>% 
  mutate(Newmax = pmax(e:g,na.rm=TRUE)) %>% 
  mutate(Newmin = pmin(e:g,na.rm=TRUE)) %>% 
  mutate(Oldrange = Oldmax-Oldmin) %>% 
  mutate(Newrange = Newmax-Newmin) %>% 
  mutate(across(e:g,
                (((~ .x - Oldmin) * Newrange) / Oldrange) + Newmin,
                .names = "{.col}_bal")
         )

The console tells me that there is a problem with the argument being NA, and I set na.rm to TRUE, but still face problems. Any further advise regarding pmax and rowwise functions would be greatly appreciated. Thanks in advance.


Solution

  • First of all, thanks to @Iroha for the huge help. The thing was that there were compatibility and support problems when using non-tidy functions with other tidy-functions, which led me to be quite confused (ik, rookie mistake :p).

    Hence, to deal with the problem. you have to call the function with do.call() and recall the columns with pick(). The code fixed would be the following:

    # Example df
    
    ind <- c("A","B","C")
    y <- c(2008,2012,2016,2020)
    indiv <- rep(ind, times=4)
    year <- rep(y, times=3)
    
    a <- runif(n=12, min=0, max=100)
    b <- runif(n=12, min=0, max=100)
    c <- runif(n=12, min=0, max=100)
    d <- runif(n=12, min=0, max=100)
    e <- runif(n=12, min=0, max=100)
    f <- runif(n=12, min=0, max=100)
    g <- runif(n=12, min=0, max=100)
    
    df_data <- data.frame(indiv,year,a,b,c,d,e,f,g)
    
    # Code for max min and new range
    
    newdf <- df_data %>% 
      mutate(Oldmax = do.call(pmax,c(pick(a:g),na.rm=TRUE)),
             Oldmin = do.call(pmin,c(pick(a:g),na.rm=TRUE)),
             Newmax = do.call(pmax,c(pick(e:g),na.rm=TRUE)),
             Newmin = do.call(pmin,c(pick(e:g),na.rm=TRUE)),
             Oldrange = Oldmax-Oldmin,
             Newrange = Newmax-Newmin) %>% 
      mutate(across(e:g,
                    (((~ .x - Oldmin) * Newrange) / Oldrange) + Newmin,
                    .names = "{.col}_bal")
             )
    
    

    No need to apply it in the across one though, it is supported. Check info regarding do.call() function if you do not know how it works, it can be super useful even if you do not recall it all the time (like what happened in my case).

    Hope anyone dealing with this kind of problems can find it useful :)