I have a table with customer IDs, creditcard_number and creditcard_type (amex, mastercard, etc.) I am trying to find a way to:
I tried with change data feed (but it only tracks row-level changes) I tried with column mapping (but column mapping doesn't modify the data files too). And if it doesn't modify the files too, i can't see them in Dremio (which i'm using as a federator to load the data further to Ontopic - the virtual knowledge graph tool).
How can i change the name of a column? Then access it's old name. And everything has to be done with Delta. Is Delta event capable of this? Or am I missing something?
Yes, you can rename columns starting with Delta Lake 1.2.0, at least in Apache Spark (not sure about Dremio) - you need to enable column mapping by name (my table is called test
):
ALTER TABLE test SET TBLPROPERTIES (
'delta.columnMapping.mode' = 'name',
'delta.minReaderVersion' = '2',
'delta.minWriterVersion' = '5')
Then rename columns:
ALTER TABLE test rename column creditcard_type to type;
ALTER TABLE test rename column creditcard_number to number
And you can pull the old and new names from Delta table history with something like this (pull history, leave only RENAME COLUMN
operations and select old and new names from the operationParameters
column):
# fetch history for the table
tdf = spark.sql("describe history test") \
.filter("operation='RENAME COLUMN'") \
.selectExpr("operationParameters['oldColumnPath'] as old_name",
"operationParameters['newColumnPath'] as new_name")
# convert dataframe into a dict old_name -> new_name
names_mapping = dict([[r[0], r[1]] for r in tdf.collect()])
will give you:
>>> names_mapping
{'creditcard_number': 'number', 'creditcard_type': 'type'}