[SOLVED] Apache Iceberg tables and primary keys

Apache Iceberg tables and primary keys

We’re looking at moving our data from an on-prem Microsoft SQL Server to AWS and are looking into various table formats like Hudi, Delta Lake, and Apache Iceberg. Our current setup in SQL Server uses auto-increment IDs for most of our primary keys and it doesn't seem Iceberg has a straightforward equivalent.

I’m trying to figure out the best way to deal with unique identifiers in Iceberg, especially since we rely on these auto-increment IDs a lot. For example, taking a stockmarket example, you would have a Security table with details like Security Code, Description, and ISIN, and a Price table where each price entry is linked to a security via its ID.

Any suggestions on how to replicate or replace the auto-increment functionality in Iceberg?

Solution

Yes, Iceberg don't have inbuilt auto increment number but it depends more on the SQL engine you are using for processing. Example Trino have UUID data type which can act as a primary key and UUID is supported by Iceberg as well.

Or you can implement UDF if using Spark as a processing engine.

Or similar to Oracle( which now supports Auto increment) sequences, can implement it using a table and updating/increment the value after each insert.