rapache-sparkrstudiosparklyr

SparklyR removing a tbl from Spark Context


Similar to: SparklyR removing a Table from Spark Context, but different because:

The above question asks how to remove a "table" from spark, here created by the copy_to function. If the spark_read_csv() function is used instead it appears that there is a difference in class.

my_csv <- spark_read_csv("name", sc)
db_drop_table(my_table)

returns:

Error in UseMethod("db_drop_table") : 
  no applicable method for 'db_drop_table' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"

Which indicates further that the object created here is not a table but a tbl, Hadleys data type of choice.

Therefore, how can I remove a specific tbl and only that tbl from the memory/session without exiting the full session?

Bonus: is there a button in RStudio Server interface that I've missed that will perform this process for me? I can't see on obvious way to do this in the spark connection tab.


Solution

  • In general sparklyr:

    You can remove tables from metastore using dropView method:

    sc %>% spark_session() %>% invoke("catalog") %>%
      invoke("dropTempView", "my_table")
    

    or clear cache with clearCache method:

    sc %>% spark_session() %>% invoke("catalog") %>% 
      invoke("clearCache")
    

    Unless you're worried about the name clashes you should probably focus on the second one, although I'd recommend avoiding eager caching, unless it is strictly necessary.