cperformancemonetdblite

Using monetdb_append from MonetdbLite C API


I am trying to use MonetDBLite C in an application. According to the PDF (https://arxiv.org/pdf/1805.08520.pdf), I would benefit from a boost in speed in loading massive amount of data using monetdb_append function. From PDF:

In addition to issuing SQL queries, the embedded process can efficiently bulk append large amounts of data to the database using the monetdb_append function. This function takes the schema and the name of a table to append to, and a reference to the data to append to the columns of the table. This function allows for efficient bulk insertions, as there is significant overhead involved in parsing individual INSERT INTO statements, which becomes a bottleneck when the user wants to insert a large amount of data.

This is the declaration in embedded.h

char* monetdb_append(monetdb_connection conn, const char* schema, const char* table, append_data *data, int ncols);

Has anybody an example how to use this function? I assume that batid of the append_data structure is the identification of a BAT structure. But it is not clear how that can be used with the existing API.


Solution

  • The binary append indeed requires construction of as many BAT structures as you have columns to append. Some additional MonetDBLite headers need to be included (monetdb_config.h and gdk.h). The important parts are:

    1. Create the BATs using COLnew with the correct type and count
    2. Add some values to them, e.g. by pointer access (of the correct type length) to bat->theap.base[i]
    3. Set BAT properties (BATsetcount, BATsettrivprop and BBPkeepref) for the append
    4. Allocate and populate the append_data data structure.
    5. Call monetdb_append.

    Below is a short example how to append 42 values to a one-column table containing integers (CREATE TABLE test (my_column INTEGER);)

    // startup, connect etc. before
    
    size_t n = 42;
    BAT* b = COLnew(0, TYPE_int, n, TRANSIENT);
    for (size_t i = 0; i < n; i++) {
        ((int*)b->theap.base)[i] = i; // or whatever
    }
    
    BATsetcount(b, n);
    BATsettrivprop(b);
    BBPkeepref(b->batCacheid);
    
    append_data *ad = NULL;
    ad = malloc(1 * sizeof(append_data));
    ad[0].colname = "my_column";
    ad[0].batid = b->batCacheid;
    
    if (monetdb_append(conn, "sys", "test", ad, 1) != NULL) { /* handle error */}