python-3.xamazon-redshiftaws-gluepg8000

Is there a server-side cursor on pure python based postgresql/redshift driver?


I am using pg8000 package with Python 3 to query a table, and noticing the app memory consumption is growing as the table's record number grows which is now reaching over 16GB of memory consumption.

While inspecting cursor behaviour of pg8000 package, I found the cursor cached the whole result set into an in-memory queue under curr._cached_rows property before being dequeued by any .fetchone(), .fetchmany(n) or .fetchall() methods.

In my specific case, I am extracting records on AWS Redshift by using Python shell on Glue job that comes with 2 restrictions:

  1. No supports for Python's C based package (no psycopg unfortunately, which supports server-side cursor's cache)
  2. Maximum of 16GB RAM.

I have gone through the pg8000 documentation but couldn't find any reference to setting up server-side cursor. Is there anyway to setup server-side cursor on pg8000 ? Or is there any other postgres/redshift pure Python package that supports server-side cursor?


Solution

  • The pg8000 docs now have an example of server-side cursors:

    >>> import pg8000.native
    >>>
    >>> con = pg8000.native.Connection(username, password="cpsnow")
    >>> con.run("START TRANSACTION")
    []
    >>> con.run("DECLARE c SCROLL CURSOR FOR SELECT * FROM generate_series(1, 100)")
    []
    >>> con.run("FETCH FORWARD 5 FROM c")
    [[1], [2], [3], [4], [5]]
    >>> con.run("MOVE FORWARD 50 FROM c")
    []
    >>> con.run("FETCH BACKWARD 10 FROM c")
    [[54], [53], [52], [51], [50], [49], [48], [47], [46], [45]]
    >>> con.run("CLOSE c")
    []
    >>> con.run("ROLLBACK")
    []