Is auto compaction will break existing z-ordered tables in delta lake? I'm curious what is the recommended way to use optimized writing, auto compaction, and z-ordering in terms of performance of Spark.
Good question. I am doing the Certification study for Databricks as it so happens.
They seem at odds. Short answer to your question is YES.
Why?
ZORDER
as in %sql OPTIMIZE delta./mnt/delta/t1 ZORDER BY (c1, c2)
is essentially clustering to speed up filter, where operations.
Auto Compaction
, when enabled, is about making fewer, larger files from smaller files to improve performance - small files problems. That will be done without being cognisant of ZORDER. Thus breaking the ZORDER aspects achieved. NB: I read also that it is most applicable for Structured Streaming.