rvectorizationwarnings

How should I handle many warnings from unused values computed in a Vectorised function?


I'm using the Vectorise function on psych::phi2poly, and then selecting certain values from the output using ifelse. Unfortunately phi2poly raises warnings whenever it sees a NA. The warnings are irrelevant as the ifelse handles NAs without using phi2poly. MWE below.

I'm reluctant to suppress all warnings from phi2poly as that could mask other issues. Is there a clean way to handle this?

MWE:

library(psych)

r_biserial <- function(p, r_pb) {
  # Jacobs' formula, r_b = sqrt(pq)/f(z_p) . r_pb:
  sqrt(p * (1-p)) / dnorm(qnorm(p)) * r_pb
}

r_tetrachoric <- function(a, b, phi) {
  # This causes a bucketload of warnings
  Vectorize(phi2poly)(a, b, phi)

  # This gets rid of the warnings but is a total hack
  # Vectorize(phi2poly)(
  #   a |> replace_na(0.5),
  #   b |> replace_na(0.5),
  #   phi |> replace_na(0.5))
}

adjust_correlation <- function(p, q, cor) {
  ifelse(is.na(p), 
         ifelse(is.na(q), 
                cor, 
                r_biserial(q, cor)), 
         ifelse(is.na(q),
                r_biserial(p, cor), 
                r_tetrachoric(p, q, cor)))
}

adjust_correlation(
  c(NA, NA, 0.5, 0.5),
  c(NA, 0.6, NA, 0.6),
  c(0.3, 0.3, 0.3, 0.3)
)

Edit: Speed is not an issue here. In the end I went with a modification of the accepted answer that I preferred (as it contained the phi2poly issue more locally).

r_tetrachoric <- function(a, b, phi) {
  # We cannot call phi2poly with NAs
  call = !is.na(a) & !is.na(b)

  result <- rep(NA_real_, length(a)) 
  result[call] <- Vectorize(phi2poly)(phi[call], a[call], b[call])
  result
}

adjust_correlation <- function(p, q, cor) {
  case_when(
    is.na(p) & is.na(q)   ~ cor,
    is.na(p) & !is.na(q)  ~ r_biserial(q, cor),
    !is.na(p) & is.na(q)  ~ r_biserial(p, cor),
    !is.na(p) & !is.na(q) ~ r_tetrachoric(p, q, cor)
  )
}

Solution

  • Definitely avoid the ifelses. Make a data.frame and fill a column successively with results using boolean vectors for sub-setting - definitely faster and much easier to read.

    > library(psych)
    > 
    > r_biserial <- function(p, r_pb) {sqrt(p*(1 - p))/dnorm(qnorm(p))*r_pb}
    > 
    > adjust_correlation <- function(p, q, corr=.3) {
    +   data <- data.frame(p, q, corr, res=NA_real_)
    +   na_pq <- with(data, is.na(p) & is.na(q))
    +   na_p <- with(data, is.na(p) & !is.na(q))
    +   na_q <- with(data, !is.na(p) & is.na(q))
    +   na_0 <- with(data, !is.na(p) & !is.na(q))
    +   data$res[na_pq] <- corr
    +   data$res[na_p] <- r_biserial(q[na_p], corr)
    +   data$res[na_q] <- r_biserial(p[na_q], corr)
    +   data$res[na_0] <- Vectorize(phi2poly)(p[na_0], q[na_0], corr)
    +   data$res
    + }
    > 
    > adjust_correlation(
    +   p=c(NA, NA, 0.5, 0.5),
    +   q=c(NA, 0.6, NA, 0.6)
    + )
    [1] 0.3000000 0.3804121 0.3759942 0.8504852
    

    I assumed you want corr always be the same value, but you can still change to corr[<na_vector>] to be able to use corr as a vector in the arguments.