I am new to InfluxDB and I am trying to compare the performance of MariaDB and InfluxDB 2.0. Therefore I perform a benchmark of about 350.000 rows which are stored in a txt file (30mb).
I use ‘executemany’ to write multiple rows into the database when using MariaDB which took about 20 seconds for all rows (using Python).
So, I tried the same with InfluxDB using the Python client, attached are the major steps of how i do it.
#Configuring the write api
write_api = client.write_api(write_options=WriteOptions(batch_size=10_000, flush_interval=5_000))
#Creating the Point
p = Point(“Test”).field(“column_1”,value_1).field(“column_2”,value_2) #having 7 fields in total
#Appending the point to create a list
data.append(p)
#Then writing the data as a whole into the database, I do this after collecting 200.000 points (this had the best performance), then I clean the variable “data” to start again
write_api.write(“bucket”, “org”, data)
When executing this it takes about 40 seconds which is double the time of MariaDB.
I am stuck with this problem for quite some time now because the documentation suggests that I write it in batches, which I do and in theory it should be faster than MariaDB.
But probably I am missing something
Thank you in Advance!
It takes some time to shovel 20MB of anything onto the disk.
executemany
probably does batching. (I don't know the details.)
It sounds like InfluxDB does not do as good a job.
To shovel lots of data into a table:
LOAD DATA INFILE
is the fastest. But if you have to first create that file, it may not win the race.INSERTs
are very fast: INSERT ... VALUE (1,11), (2, 22), ...
For 100 rows, that runs about 10 times as fast as single-row INSERTs
. Beyond 100 or so rows, it gets into "diminishing returns".INSERTs
into a "transaction" avoids transactional overhead. (Again there is "diminishing returns".)There are a hundred packages between the user and the database; InfluXDB is yet another one. I don't know the details.