mariadbbenchmarkinginfluxdbinfluxdb-python

InfluxDB 2.0 has slow write performance with Python client compared to MariaDB


I am new to InfluxDB and I am trying to compare the performance of MariaDB and InfluxDB 2.0. Therefore I perform a benchmark of about 350.000 rows which are stored in a txt file (30mb).

I use ‘executemany’ to write multiple rows into the database when using MariaDB which took about 20 seconds for all rows (using Python).

So, I tried the same with InfluxDB using the Python client, attached are the major steps of how i do it.

#Configuring the write api
write_api = client.write_api(write_options=WriteOptions(batch_size=10_000, flush_interval=5_000))

#Creating the Point
p = Point(“Test”).field(“column_1”,value_1).field(“column_2”,value_2) #having 7 fields in total

#Appending the point to create a list
data.append(p)

#Then writing the data as a whole into the database, I do this after collecting 200.000 points (this had the best performance), then I clean the variable “data” to start again
write_api.write(“bucket”, “org”, data)

When executing this it takes about 40 seconds which is double the time of MariaDB.

I am stuck with this problem for quite some time now because the documentation suggests that I write it in batches, which I do and in theory it should be faster than MariaDB.

But probably I am missing something

Thank you in Advance!


Solution

  • It takes some time to shovel 20MB of anything onto the disk.

    executemany probably does batching. (I don't know the details.)

    It sounds like InfluxDB does not do as good a job.

    To shovel lots of data into a table:

    There are a hundred packages between the user and the database; InfluXDB is yet another one. I don't know the details.