monetdb

What's the fastest way to clone a big table in MonetDB?


I got a table of 300 millions rows (and 10 columns) in MonetDB 11.47 and I want to clone it in the same server.

So far I've tried with

START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
CREATE TABLE cloned_table AS (SELECT * FROM big_table);
COMMIT;

but, after calling COMMIT; MonetDB starts creating a huge transaction log blocking all other queries.

This is the status during the operation

Transaction size htop

As you can see the worst part is that MonetDB is using disk, even if there's a lot of free memory.

What would be the best approach??


Solution

  • MonetDB has to write a lot of data to disk for this query: once to secure the log file, and once to persist the data of the new table. The available memory doesn't change this fact. (large memory is mainly advantageous for queries producing large intermediate data, which is not the case here).

    The huge log file is caused by the size of the transaction, i.e. 300 rows * 10 cols * . MonetDB logs all data of a transaction in one file.

    Since you're copying a large table, it's recommendable to separate table creation and data insertion into their own transactions. CREATE TABLE blocks everything because it modifies the SQL catalogue, so you'd want to keep any such operations as short as possible. Then you can let INSERT INTO happily run in its own transaction without blocking queries on any other tables (concurrent updates on cloned_table is not recommended here). ==> this probably won't affect the situation described above, but it'll make your database better available for other queries in the meantime.

    Depending on how you're going to use the cloned_table, you can avoid the log file completely by making it into an UNLOGGED table [1]. You can persist the data, but it's not possible yet to enable the log for an UNLOGGED table afterwards [2].

    You can also consider copying the data from big_table to cloned_table in several smaller chunks. Or even make use of MERGE TABLE [1].

    [1] https://www.monetdb.org/documentation-Dec2023/user-guide/sql-manual/data-definition/table-definition/

    [2] Correction: seems it's possible after all with PERSIST_UNLOGGED.