I have a PySpark DataFrame and I want to create it as Delta Table on my unity catalog.
I have the following strucutre:
prd
|—- landing
|—- bronze
|—- silver
|—- gold
|—- qa
I have my prd catalog with my qa database. My delta table is stored on gold database. So, I loaded it in a DataFrame and I want to create a table in my qa database. I know I can do it using spark sql like:
CREATE TABLE ‘table_name’
(
schema here
)
using delta
partitioned by(year)
tblproperties (
‘delta.autoOptimize.autoCompact’ = ‘true’,
‘delta.autoOptimize.optimizeWrite’= ‘true’,
‘delta.feature.allowColumnDefaults’= ‘supported’
)
When I run this code, the table is created on prd.qa
.
My question is:
Is there a way to create that table on prd.qa
using PySpark? Something like:
df.write.format(‘delta’).saveAsTable(‘prd.qa.table_name’)
If you want to use df.write...saveAsTable
, then you will need to include .partitionBy
clause as well to force partitioning by a given column, and then you need to set options afterward.
But really, a better way to achieve this is to use the so-called DEEP CLONE (doc, blog post) - it will copy all table options, partitioning, etc. together with data. But the more important is that DEEP CLONE is incremental - if you run it next time, it will copy only new data. For example, you can execute this SQL command to achieve that:
CREATE OR REPLACE prd.qa.table_name
DEEP CLONE prd.gold.table_name