rif-statementsubstring

setting different conditions on multiple columns of a data frame in one go


I need to write a condition saying if 9 columns of a big data frame include either of the followings: "6%t" (as the first three characters in each row), or NA, or "", then save the file as csv. I have difficulty setting the conditions in the correct way. The aim is to make sure, I save the write data frame.

Assume that cols is the 9 columns that I need to check in my data frame.

cols -> c("AA", "BB", "SS", "EE"", "OO", "UU", "PP", "QQ", "FF")
if (substring(df[,cols], 1, 3) == "6%t" || is.na(df[,cols]) || df[,cols] == "") {
  write.csv(df, file = paste0(path, ".csv"))}

However, I get the following error:

the condition has length > 1

Could you please help me figure this out?


Solution

  • Always provide some data with your code. The problem is simple: conditions with the wrong dimensions. The IF statement must be TRUE or FALSE, not a logical vector. Look:

    #
    aux <- sample(c(words, 1000:1999), 17)
    aux <- sample(c(aux, "6%tFoo", NA, ""))
    aux <- structure(aux, dim = 4:5)
    colnames(aux) <- letters[1:5]
    
         a          b            c          d          e     
    [1,] "the"      "particular" "consider" "6%tFoo"   "1649"
    [2,] "1167"     "1402"       "of"       "along"    NA    
    [3,] "question" ""           "many"     "exercise" "1709"
    [4,] "1397"     "1152"       "oppose"   "problem"  "1114"
    
    # 
    substring(aux, 1, 3) == "6%t"
    
         a     b     c     d     e
    [1,] FALSE FALSE FALSE  TRUE FALSE
    [2,] FALSE FALSE FALSE FALSE    NA
    [3,] FALSE FALSE FALSE FALSE FALSE
    [4,] FALSE FALSE FALSE FALSE FALSE
    

    Your dataset appears to have NA values, so be aware of substring behavior with it. Try this:

    #
    test <- \(x) any((sapply(x, substring, 1, 3) == "6%t") & !is.na(x)) ||
      any(is.na(x)) ||
      any(x == "")
    
    #
    cols <- "b"
    test(aux[, cols])
    [1] TRUE
    
    #
    cols <- "d"
    test(aux[, cols])
    [1] TRUE
    
    #
    cols <- "e"
    test(aux[, cols])
    [1] TRUE
    
    #
    cols <- c("a", "c")
    test(aux[, cols])
    [1] FALSE
    

    EDITED after the first reply.

    To check if all elements of selected columns are either NA, "" or starts with "6%t":

    aux <- structure(
      list(
        a = c("the", "1167", "question", "1397"), 
        b = c("", "", "", NA), 
        c = c("consider", "of", "many", "oppose"), 
        d = c("6%tFoo", "6%talong", "", NA), 
        e = c("1649", NA, "1709", "1114")), 
      
      row.names = c(NA, -4L), 
      class = "data.frame")
    
    param   <- c(NA, "", "6%t")
    
    # Condition must be true
    cols    <- c("b", "d") 
    aux_sub <- sapply(aux[, cols], substring, 1, 3)
    
    test <- length(setdiff(aux_sub, param)) == 0
    [1] TRUE
    
    # Condition must be false
    cols    <- c("a", "b", "d")
    aux_sub <- sapply(aux[, cols], substring, 1, 3)
    
    test <- length(setdiff(aux_sub, param)) == 0
    [1] FALSE