delta-lake

Custom metadata/tags for Delta Lake?


I'm trying to tie two tables' versions together. Like if table A's version 1 was used to generate table B's version 3, I want to be able to tell that. Is there something already exist in Delta Lake that can do this functionality easily?

I think maybe I can try to always make the two version numbers match, like if I change one table, I'll also have an extra operation to the other table. But this doesn't seem like a real solution or any where near a robust solution.

Thank you in advance!


Solution

  • Since there is no custom metadata mechanism in Delta and since there is no way to coordinate transactions across Delta tables, the best practice for addressing this problem is to add extra columns to the data.

    Don't worry about storage cost, because Parquet compression will use vary little space for long runs with the same value in a column. Don't worry about query performance because (a) if you don't need the metadata columns, they won't be retrieved and (b) because Delta stats collection will optimize queries if you do need to filter by the metadata.

    Hope this helps.