oracle-databasehidden-fieldrstudio-serverroracle

ROracle creates shadow or invisble columns in R


Working with a data.frame that was read into Rstudio server with ROracle, I came across some very strange behaviour. There seemed to be an extra column in the data, that R didn't seem detect at first, but nevertheless was very much there.

I wasn't aware of oracle hidden fields before and came across this, which could explain this phenomena. What are Oracle hidden fields?

However, this is still extremely bizarre for me that this is even possible in R and the inconsistencies are baffling. Of course, the example is not reproducible, because it's based on a specific oracle dataset, that was successfully loaded into R with dbReadTable. I'd just like to highlight this for anyone using ROracle, that this is what you could get into Rstudio and how it looks like from the Rstudio perspective, when a hidden column gets loaded into R.

Can somebody explain what is the inherent difference between the $ operator and [[]] or exists() and why does the $ operator seems to be the only way to detect this column

> EXAMPLE_TABLE <-
+   dbReadTable(
+     con_ROracle,
+     schema  = SCHEMA_NR,
+     name  = TABLE_NAME) %>%
+   head(100)
> 
> # names doesn't find the column
> 
> "L" %in% names(EXAMPLE_TABLE)
[1] FALSE
> 
> # subsetting with [["L"]] doesn't find it
> EXAMPLE_TABLE[["L"]]
NULL
> 
> # the function "exists" doesn't find it
> 
> exists("L", EXAMPLE_TABLE)
[1] FALSE
> 
> # dplyr selection doesn't find it
> 
> EXAMPLE_TABLE %>% 
+   select(L)
Error: Can't subset columns that don't exist.
x The column `L` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
> 
> # But the $ operator does find it!
> EXAMPLE_TABLE$L 
  [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [53]  1  1  1  1  1  1  1  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> 
> # And  replacing it with a NULL value doesn't work 
> 
> EXAMPLE_TABLE$L <- NULL
> EXAMPLE_TABLE$L 
  [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [53]  1  1  1  1  1  1  1  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> 
> # The values of the hidden field are accesible 
> new_value <- EXAMPLE_TABLE$L
> new_value
  [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [53]  1  1  1  1  1  1  1  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> 
> 
> # Only replacing with new values helps
> 
> EXAMPLE_TABLE$L <- 5
> EXAMPLE_TABLE$L
  [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
 [79] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

>

Strangely, the hidden column cannot be removed with <- NULL, but can be overwritten with it.


Solution

  • When using $ to select a column it doesn't require an exact match. You must have a column whose name starts with L and that's the only column that starts with L.

    For an example look at mtcars

     colnames(mtcars)
     # both return the column corresponding to mpg
     mtcars$mpg
     mtcars$m