delta-lake

Confusion About Delta Lake


I have tried to read a lot about databricks delta lake. From what I understand it adds ACID transactions to your data storage and accelerated query performance with a delta engine. If so, why do we need other data lakes which do not support ACID transactions? Delta lakes claims to combine both worlds of data lakes and data warehouse, we know that it can not replace a traditional data warehouse yet due to its current support of operations. But should it replace data lakes? Why the need to have two copies of data - one in data lake and one in delta lake?


Solution

  • Delta Lake is a product (like Redshift) rather than a concept/approach/theory (like dimensional modelling). As with any product in any walk of life, some of the claims made for the product will be true and some will be marketing spin. Whether the claimed benefits for a product actually make it superior to an alternative product will change from use case to use case.

    Asking why there are other data lake solutions besides Delta Lake is a bit like asking why there is more than one DBMS in the world.