databaseperformancetime-seriesquestdb

Speed up basic QuestDB ingestion


I am starting a basic QuestDB instance using docker compose (yaml below). I want to test ingestion and I am not sure what I am doing wrong, because I have seen claims that you can ingest more than a million events per second, but I cannot seem to get more than 100K events per second with a very basic script.

This is my yaml:

services:
  questdb:
    image: questdb/questdb:8.1.1
    container_name: rta_questdb
    restart: always
    ports:
      - "8812:8812"
      - "9000:9000"
      - "9009:9009"
      - "9003:9003"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    environment:
      - QDB_METRICS_ENABLED=TRUE      
    volumes:
      - ./questdb/questdb_root:/var/lib/questdb/:rw

And this is my python basic script simulating IoT data. I am not creating the table in advance, but relying on QuestDB table auto creation:

from questdb.ingress import Sender, IngressError, TimestampNanos
import os
import sys
import random


DEVICE_TYPES = ["blue", "red", "green", "yellow"]
EVENTS = 10000000
MIN_LAT = 19.50139
MAX_LAT = 64.85694
MIN_LON = -161.75583
MAX_LON = -68.01197

try:
    conf = f'http::addr=host.docker.internal:9000;'
    with Sender.from_conf(conf) as sender:
        for it in range(EVENTS):                
            sender.row(
                'iot_data',
                symbols={'device_type': random.choice(DEVICE_TYPES)},
                columns={
                            'duration_ms': random.randint(0, 4000),
                            "lat": random.uniform(MIN_LAT, MAX_LAT),
                            "lon": random.uniform(MIN_LON, MAX_LON),
                            "measure1": random.randint(-2147483648, 2147483647),
                            "measure2": random.randint(-2147483648, 2147483647),
                            "speed": random.randint(0, 100)
                },
                at=TimestampNanos.now())                                
except IngressError as e:
    sys.stderr.write(f'Got error: {e}')

I am running this on a Macbook Pro M1.

Any ideas why I am not getting the advertised performance?


Solution

  • There are a number of reasons why the code above is inserting "only" 100K events per second.

    Docker on Mac

    The first one would be docker on Mac. QuestDB can be run in Docker, but Docker for Mac has some limitations (you will notice a warning both on the logs and on the web console when you start QuestDB with Docker on Mac). Running QuestDB on Docker on a Linux machine, or running QuestDB from the binary on Mac should give you better performance out of the box. Testing the same script on a M1 Macbook Pro gives me ~230K events per second.

    Parallelisation

    While QuestDB can ingest several million rows per second, those numbers are measured when establishing several connections in parallel. On a single connection you cannot get that much throughput. On production you would ideally run multiple scripts in parallel (maybe multiple Kafka Connect workers, if your data is coming from Kafka). We can test the effects of sending data in parallel by modifying the ingestion script

    from questdb.ingress import Sender, IngressError, TimestampNanos
    import sys
    import random
    from multiprocessing import Pool
    import math
    
    DEVICE_TYPES = ["blue", "red", "green", "yellow"]
    TOTAL_EVENTS = 10000000
    NUM_SENDERS = 8
    
    def send(sender_id, total_events):
        sys.stdout.write(f"Sender {sender_id} will send {total_events} events\n")
        try:
            conf = 'http::addr=localhost:9000;'
            with Sender.from_conf(conf) as sender:
                for _ in range(total_events):
                    sender.row(
                        'iot_data',
                        symbols={'device_type': random.choice(DEVICE_TYPES)},
                        columns={
                            'duration_ms': random.randint(0, 4000),
                            'measure1': random.randint(-2147483648, 2147483647),
                            'measure2': random.randint(-2147483648, 2147483647),
                            'speed': random.randint(0, 100)
                        },
                        at=TimestampNanos.now()
                    )
        except IngressError as e:
            sys.stderr.write(f'Sender {sender_id} got error: {e}\n')
    
    if __name__ == '__main__':
        events_per_sender = TOTAL_EVENTS // NUM_SENDERS
        remaining_events = TOTAL_EVENTS % NUM_SENDERS
    
        sender_events = [events_per_sender] * NUM_SENDERS
        for i in range(remaining_events):
            sender_events[i] += 1
    
        with Pool(processes=NUM_SENDERS) as pool:
            sender_ids = range(NUM_SENDERS)
            pool.starmap(send, [(sender_id, sender_events[sender_id]) for sender_id in sender_ids])
    

    When running the script, I can see ~1,499,000 events per second.

    Config settings

    On a production environment, you probably would want to adjust config parameters like QDB_LINE_TCP_WRITER_WORKER_COUNT and other related parameters, which allow us to configure how many workers are in the shared pool and how many workers are reserved for writing, for pre-processing, and so on.

    Those are low level parameters that you fine tune once you are already up and running, but in this particular case I believe Docker on Mac and the lack of parallelisation were the biggest bottlenecks for your ingestion.