Situation: Need to insert quite a bit of data into a SQLite database.
Problem: There are two statements we can use to insert data -
data = [("111", "222", "333"), ("AAA", "BBB", "CCC"), ("XXX", "YYY", "ZZZ")]
#method1
for item in data:
cursor.execute("INSERT INTO table(value1, value2, value3) VALUES (?,?,?)", item)
conn.commit()
#method2
cursor.execute("INSERT INTO table(value1, value2, value3) VALUES(?,?,?)", data)
conn.commit()
Question: If ignoring speeds, which one is better practice from a programming point of view? And if possible, explain why as well.
From purely programming practice point of view, aside from speed, there is no difference. However...
Prepared statements are good. However, mass-insert makes a mass of variable bindings, and SQLite has an upper limit to number of host parameters it can process, which defaults to 999.
Thus, multi-insert is nice for toying around, but for real data you'll be using a loop. One good advice I can offer though is you'll want to wrap the loop in a transaction, because without it, AFAIK, each insert will be an automatic transaction, which will drastically impact the time. (Also, commit at the end of the loop, not within the loop.)
EDIT: According to Python docs,
By default, the sqlite3 module opens transactions implicitly before a Data Modification Language (DML) statement (i.e.
INSERT
/UPDATE
/DELETE
/REPLACE
), and commits transactions implicitly before a non-DML, non-query statement (i. e. anything other thanSELECT
or the aforementioned).
So your code in #method1 is doing [BEGIN
], INSERT
, COMMIT
, [BEGIN
], INSERT
, COMMIT
... with BEGIN
implicitly being sent by Python to start a transaction, and COMMIT
explicitly ending it. If you structure your code like this:
for item in data:
cursor.execute("INSERT INTO table(value1, value2, value3) VALUES (?,?,?)", item)
conn.commit()
then you have one implicit BEGIN
at the start, lots of INSERTS
and one explicit COMMIT
at the end. This should speed up your code by 10-20x or so.