performancesqlitefetchall

Is it normal sqlite.fetchall() is slow?


I have an SQL query that selects from two inner joined tables. The select statement takes about 50 seconds. However, the fetchall() takes 788 seconds for only 981 results:

time0 = time.time()
self.cursor.execute("SELECT spectrum_id, feature_table_id "+
                    "FROM spectrum AS s "+
                    "INNER JOIN feature AS f "+
                    "ON f.msrun_msrun_id = s.msrun_msrun_id "+
                    "INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax "+
                                 "FROM convexhull GROUP BY feature_feature_table_id) AS t "+
                    "ON t.feature_feature_table_id = f.feature_table_id "+
                    "WHERE s.msrun_msrun_id = ? "+
                    "AND s.scan_start_time >= t.rtMin "+
                    "AND s.scan_start_time <= t.rtMax "+
                    "AND base_peak_mz >= t.mzMin "+
                    "AND base_peak_mz <= t.mzMax", spectrumFeature_InputValues)
print 'query took:',time.time()-time0,'seconds'
    
time0 = time.time()
spectrumAndFeature_ids = self.cursor.fetchall()      
print time.time()-time0,'seconds since to fetchall'

Is there a reason why fetchall() takes so long? Doing:

while 1:
    info = self.cursor.fetchone()
    if info:
        <do something>
    else:
        break

is just as slow as :

allInfo = self.cursor.fetchall()         
for info in allInfo:
    <do something>

Solution

  • By default fetchall() is as slow as looping over fetchone() due to the arraysize of the Cursor object being set to 1.

    To speed things up you can loop over fetchmany(), but to see a performance gain, you need to provide it with a size parameter bigger than 1, otherwise it'll fetch "many" by batches of arraysize, i.e. 1.

    It is quite possible that you can get the performance gain simply by raising the value of arraysize, but I have no experience doing this, so you may want to experiment with that first by doing something like:

    >>> import sqlite3
    >>> conn = sqlite3.connect(":memory:")
    >>> cu = conn.cursor()
    >>> cu.arraysize
    1
    >>> cu.arraysize = 10
    >>> cu.arraysize
    10
    

    More on the above here: http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.fetchmany