This site has lots of questions on how to fix an "undefined column" error.
I have the exact opposite question: how to make an "undefined column" error.
I frequently change variable names in my files.
This leads to the following error:
r$> df <- data.frame(gender=c(1,1,NA,0))
r$> sum(is.na(df$male))
[1] 0
when the correct result is 1.
I want R to print an error message if the column I'm trying to access is undefined.
Not to silently fail.
How can I do that?
Unfortunately R is rather too lenient when it comes to such matters. The $
operator for data.frames is defined to allow accessing non-existent columns and to return NULL
in that case.
There are alternative data.frame implementations which are a bit stricter. Notably, the tbl_df
data structure used by the Tidyverse packages ‘tibble’, ‘dplyr’, etc. will at least show you a warning:
df <- tibble::tibble(gender = c(1, 1, NA, 0))
sum(is.na(df$male))
# [1] 0
# Warning message:
# Unknown or uninitialised column: `male`.
Alternatively, you can make this a hard error for data.frames by overriding $
for data.frames:
registerS3method(
'$', 'tbl_df',
\(x, name) {
stopifnot(name %in% colnames(x))
NextMethod('$')
}
)
However, note that this will only apply to plain data.frame
, not to tibbles, since the latter also override $
. There does not seem to be an option to make this a hard error for tibbles (short of making all warnings into errors); this might be a nice feature request for the package (alternatively, you can make the above code apply to tibbles by replacing 'data.frame'
with 'tbl_df
).