rapache-sparksparkrsparklyr

sparklyr How to add '.option("overwriteSchema", "true")' to saveAsTable() on databricks


I am running the below code in databricks to save a table using sparklyr

library(sparklyr)
library(dplyr)

sc <- sparklyr::spark_connect(method = "databricks")

dat <- sparklyr::spark_read_table(sc, "products.output")
dat <- dat %>% dplyr::mutate(x = as.character(x), y = as.character(y)) 

%sql
drop table products.output

sparklyr::spark_write_table(x = dat , name = "products.output")

org.apache.spark.sql.AnalysisException:
The schema of your Delta table has changed in an incompatible way since your DataFrame or 
DeltaTable object was created. Please redefine your DataFrame or DeltaTable object

Is there anyway I can overwrite the schema?


Solution

  • Same approach as the answer of this question. Following the docs of sparklyr::spark_write_table, add another argument as options=list(overwriteSchema="true"). This Databricks doc may help: https://docs.databricks.com/en/delta/update-schema.html#explicitly-update-schema-to-change-column-type-or-name

    sparklyr::spark_write_table(x = dat, name = "products.output",
                                mode = "overwrite",
                                options = list(overwriteSchema = "true"))