pythonsqlsnowflake-cloud-data-platform

Extract 4 million rows from database in chunks of 1 million rows each


I am new to python. I have seen other answers but they are not working as expected.

Can anyone help me.

chunksize= 1000000
query = "select from emp_table;"
cursor = conn.cursor()
cursor.execute(query)
r=[]


Solution

  • As you mentioned, due to high volume data should get loaded in chunks to reduce load, however loading and appending in r could lead to a very memory-intensive task.

    To avoid that you could load in chunks, write it and then load the next chunk and follow the same,so that we dont overload memory.

    Here is a sample code

    import snowflake.connector
    import csv
    
    def extract_and_write_data(conn, query, chunksize=1000000, 
        filename='extracted_data.csv'):
        cursor = conn.cursor()
        cursor.execute(query)
    
        with open(filename, 'w', newline='') as f:
            writer = csv.writer(f)
            writer.writerow(cursor.description)  # Write header row
    
            ''' your cursor should have all the rows,
              and below while loop will fetch data in 
              each chunksize, first it will fetch 1000000, 
              then next 1000000 and so on and will write to 
              database on each fetch.At last when chunk is 
              less than 1000000, it should break meaning there
               is no more row to be written.'''
           
            while True:
              chunk = cursor.fetchmany(chunksize)
              writer.writerows(chunk)
    
            # Check if there are more rows to fetch
              if len(chunk) < chunksize:
                 break
    
    # Connect to Snowflake
    conn = snowflake.connector.connect(
        user='your_user',
        password='your_password',
        account='your_account',
        warehouse='your_warehouse',
        database='your_database',
        schema='your_schema'
    )
    
    # SQL query to extract all rows from the table
    query = "SELECT * FROM emp_table"
    
    # Extract and write data to CSV
    extract_and_write_data(conn, query)
    
    # Close the connection
    conn.close()