I am using pg8000
package with Python 3 to query a table, and noticing the app memory consumption is growing as the table's record number grows which is now reaching over 16GB of memory consumption.
While inspecting cursor behaviour of pg8000
package, I found the cursor cached the whole result set into an in-memory queue under curr._cached_rows
property before being dequeued by any .fetchone()
, .fetchmany(n)
or .fetchall()
methods.
In my specific case, I am extracting records on AWS Redshift by using Python shell on Glue job that comes with 2 restrictions:
psycopg
unfortunately, which supports server-side cursor's cache)I have gone through the pg8000
documentation but couldn't find any reference to setting up server-side cursor. Is there anyway to setup server-side cursor on pg8000
? Or is there any other postgres/redshift pure Python package that supports server-side cursor?
The pg8000 docs now have an example of server-side cursors:
>>> import pg8000.native
>>>
>>> con = pg8000.native.Connection(username, password="cpsnow")
>>> con.run("START TRANSACTION")
[]
>>> con.run("DECLARE c SCROLL CURSOR FOR SELECT * FROM generate_series(1, 100)")
[]
>>> con.run("FETCH FORWARD 5 FROM c")
[[1], [2], [3], [4], [5]]
>>> con.run("MOVE FORWARD 50 FROM c")
[]
>>> con.run("FETCH BACKWARD 10 FROM c")
[[54], [53], [52], [51], [50], [49], [48], [47], [46], [45]]
>>> con.run("CLOSE c")
[]
>>> con.run("ROLLBACK")
[]