I have a very large table with 250,000+ rows, many containing a large text block in one of the columns. Right now it's 2.7GB and expected to grow at least tenfold. I need to perform python specific operations on every row of the table, but only need to access one row at a time.
Right now my code looks something like this:
c.execute('SELECT * FROM big_table')
table = c.fetchall()
for row in table:
do_stuff_with_row
This worked fine when the table was smaller, but the table is now larger than my available ram and python hangs when I try and run it. Is there a better (more ram efficient) way to iterate row by row over the entire table?
cursor.fetchall()
fetches all results into a list first.
Instead, you can iterate over the cursor itself:
c.execute('SELECT * FROM big_table')
for row in c:
# do_stuff_with_row
This produces rows as needed, rather than load them all first.