I have an SQL query that selects from two inner joined tables. The select statement takes about 50 seconds. However, the fetchall()
takes 788 seconds for only 981 results:
time0 = time.time()
self.cursor.execute("SELECT spectrum_id, feature_table_id "+
"FROM spectrum AS s "+
"INNER JOIN feature AS f "+
"ON f.msrun_msrun_id = s.msrun_msrun_id "+
"INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax "+
"FROM convexhull GROUP BY feature_feature_table_id) AS t "+
"ON t.feature_feature_table_id = f.feature_table_id "+
"WHERE s.msrun_msrun_id = ? "+
"AND s.scan_start_time >= t.rtMin "+
"AND s.scan_start_time <= t.rtMax "+
"AND base_peak_mz >= t.mzMin "+
"AND base_peak_mz <= t.mzMax", spectrumFeature_InputValues)
print 'query took:',time.time()-time0,'seconds'
time0 = time.time()
spectrumAndFeature_ids = self.cursor.fetchall()
print time.time()-time0,'seconds since to fetchall'
Is there a reason why fetchall()
takes so long? Doing:
while 1:
info = self.cursor.fetchone()
if info:
<do something>
else:
break
is just as slow as :
allInfo = self.cursor.fetchall()
for info in allInfo:
<do something>
By default fetchall()
is as slow as looping over fetchone()
due to the arraysize
of the Cursor
object being set to 1.
To speed things up you can loop over fetchmany()
, but to see a performance gain, you need to provide it with a size parameter bigger than 1, otherwise it'll fetch "many" by batches of arraysize
, i.e. 1.
It is quite possible that you can get the performance gain simply by raising the value of arraysize
, but I have no experience doing this, so you may want to experiment with that first by doing something like:
>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> cu = conn.cursor()
>>> cu.arraysize
1
>>> cu.arraysize = 10
>>> cu.arraysize
10
More on the above here: http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.fetchmany