I need to write a condition saying if 9 columns of a big data frame include either of the followings: "6%t" (as the first three characters in each row), or NA, or "", then save the file as csv. I have difficulty setting the conditions in the correct way. The aim is to make sure, I save the write data frame.
Assume that cols is the 9 columns that I need to check in my data frame.
cols -> c("AA", "BB", "SS", "EE"", "OO", "UU", "PP", "QQ", "FF")
if (substring(df[,cols], 1, 3) == "6%t" || is.na(df[,cols]) || df[,cols] == "") {
write.csv(df, file = paste0(path, ".csv"))}
However, I get the following error:
the condition has length > 1
Could you please help me figure this out?
Always provide some data with your code. The problem is simple: conditions with the wrong dimensions. The IF statement must be TRUE
or FALSE
, not a logical vector. Look:
#
aux <- sample(c(words, 1000:1999), 17)
aux <- sample(c(aux, "6%tFoo", NA, ""))
aux <- structure(aux, dim = 4:5)
colnames(aux) <- letters[1:5]
a b c d e
[1,] "the" "particular" "consider" "6%tFoo" "1649"
[2,] "1167" "1402" "of" "along" NA
[3,] "question" "" "many" "exercise" "1709"
[4,] "1397" "1152" "oppose" "problem" "1114"
#
substring(aux, 1, 3) == "6%t"
a b c d e
[1,] FALSE FALSE FALSE TRUE FALSE
[2,] FALSE FALSE FALSE FALSE NA
[3,] FALSE FALSE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE FALSE FALSE
Your dataset appears to have NA
values, so be aware of substring
behavior with it. Try this:
#
test <- \(x) any((sapply(x, substring, 1, 3) == "6%t") & !is.na(x)) ||
any(is.na(x)) ||
any(x == "")
#
cols <- "b"
test(aux[, cols])
[1] TRUE
#
cols <- "d"
test(aux[, cols])
[1] TRUE
#
cols <- "e"
test(aux[, cols])
[1] TRUE
#
cols <- c("a", "c")
test(aux[, cols])
[1] FALSE
EDITED after the first reply.
To check if all elements of selected columns are either NA
, ""
or starts with "6%t"
:
aux <- structure(
list(
a = c("the", "1167", "question", "1397"),
b = c("", "", "", NA),
c = c("consider", "of", "many", "oppose"),
d = c("6%tFoo", "6%talong", "", NA),
e = c("1649", NA, "1709", "1114")),
row.names = c(NA, -4L),
class = "data.frame")
param <- c(NA, "", "6%t")
# Condition must be true
cols <- c("b", "d")
aux_sub <- sapply(aux[, cols], substring, 1, 3)
test <- length(setdiff(aux_sub, param)) == 0
[1] TRUE
# Condition must be false
cols <- c("a", "b", "d")
aux_sub <- sapply(aux[, cols], substring, 1, 3)
test <- length(setdiff(aux_sub, param)) == 0
[1] FALSE