pysparkdatabricksdelta-lakedatabricks-unity-catalog

Create a delta table in unity catalog using PySpark


I have a PySpark DataFrame and I want to create it as Delta Table on my unity catalog.

I have the following strucutre:

prd
|—- landing
|—- bronze
|—- silver
|—- gold
|—- qa

I have my prd catalog with my qa database. My delta table is stored on gold database. So, I loaded it in a DataFrame and I want to create a table in my qa database. I know I can do it using spark sql like:

CREATE TABLE ‘table_name’ 
(
   schema here
) 
using delta
partitioned by(year)
tblproperties (
‘delta.autoOptimize.autoCompact’ = ‘true’,
‘delta.autoOptimize.optimizeWrite’= ‘true’,
‘delta.feature.allowColumnDefaults’= ‘supported’
)

When I run this code, the table is created on prd.qa.

My question is:

Is there a way to create that table on prd.qa using PySpark? Something like:

df.write.format(‘delta’).saveAsTable(‘prd.qa.table_name’)


Solution

  • If you want to use df.write...saveAsTable, then you will need to include .partitionBy clause as well to force partitioning by a given column, and then you need to set options afterward.

    But really, a better way to achieve this is to use the so-called DEEP CLONE (doc, blog post) - it will copy all table options, partitioning, etc. together with data. But the more important is that DEEP CLONE is incremental - if you run it next time, it will copy only new data. For example, you can execute this SQL command to achieve that:

    CREATE OR REPLACE prd.qa.table_name
       DEEP CLONE prd.gold.table_name