igniteapacheignite

Apache Ignite 3 - Writing Records via C++ Thin Client


I want to put the results entries in Ignite, where four C++ apps are writing results to Ignite in batches of 3000. Each engine will write approximately 205,2000 records. This is how my Ignite service looks in Docker Compose:

<< : *ignite-def
container_name: ignite
command: --node-name ignite
deploy:
  resources:
    limits:
      cpus: '5'
      memory: 6g
environment:
  - JAVA_OPTS=-Xms6g -Xmx6g
ports:
  - 10300:10300
  - 10800:10800
networks:
  - mynet

After processing 30K records, Apache exits. I am assuming the engine gets killed due to CPU constraints.

My question is, is there no way to write this many records to ignite locally? And is there a way to determine ideal configurations for Apache to run and process records like this? I am doing this for a research project and I have spent alot of time tweaking timeouts, RAM,CPU params. I want to understand how locally ignite can process this much data and if not then what could be the ideal(decent enough) configurations for this setup to actually work.

The C++ code snippet, which writes results to ignite in batches of 3000(earlier I tried 1000 as well):

query = "INSERT INTO stress_results VALUES ('" +
                    uniqueJobId + "', '" + // group_id_job_id as PK
                    values[0] + "', '" +  // id
                    scenarioLabel + "', " +   
                    values[2] + ", " +    
                    values[3] + ", " +   
                    values[4] + ")";     
            

            // Batch queries as a single multi-statement string
            batchSql += query + ";";
            batchCount++;
            totalRows++;

            // Execute batch when full or at end of data
            if (batchCount >= batchSize || totalRows == static_cast<int>(allData.size())) {
                batchNum++;
                std::cout << "[IgniteDBHandler] Writing batch #" << batchNum << " with " << batchCount << " rows..." << std::endl;
                try {
                    ignite::sql_statement stmt(batchSql);
                    client.get_sql().execute_script(stmt, {});
                    std::cout << "[IgniteDBHandler] Batch #" << batchNum << " complete." << std::endl;
                } catch (const std::exception& e) {
                    std::cerr << "[IgniteDBHandler] Failed to execute batch script:\n" << batchSql << "\nError: " << e.what() << std::endl;
                }
                batchSql.clear();
                batchCount = 0;
                std::this_thread::sleep_for(std::chrono::milliseconds(100));

My dockerized setup is running within WSL. WSL config:

[wsl2]
memory=20GB       
processors=18      
swap=12GB

The Ignite conf portions that I have changed based on what I could find online. This mostly includes increasing timeouts and memory etc.

raft {
        fsync=false
        installSnapshotTimeout=300000
        logStripesCount=2
        logYieldStrategy=false
        responseTimeout=10000
        retryDelay=500
        retryTimeout=3000000
        stripes=2
        volatileRaft {
            logStorageBudget {
                name=unlimited
            }
        }
    }

storage {
...
        profiles=[
            {
                engine=aipersist
                name=default
                replacementMode=CLOCK
                size=4294967296
            }
        ]
... }`

My docker has setup has the following services:

Ignite logs right before it exits: Ignite logs Apache Ignite logs


Solution

  • You don't say, but I'm assuming that the server logs don't say much? That would be consistent with Docker killing the server process. So really this is a Docker/resources issue.

    It's difficult to be sure, but the most likely cause is that your container is running out of memory. I don't think your JAVA_OPTS has any effect, which means it would be using the default Java heap which is 16Gb. Also, Ignite stores its data off the Java heap.

    When sizing your container, you need to consider Ignite itself, plus the Java heap, plus the off-heap storage.

    It's also worth noting that loading the data as you are is not terribly efficient. It'll work (if there's enough memory available), but there are much faster methods, such as the record API or the data streamer.