I have a table in snowflake database which contains 4 to 8 millions of rows.
I want to extract 4 million rows in chunks to reduce load on database.
When loop runs in each loop, rows should append to array r
(as shown in last line of code ) in chunks and at last all the data should be saved into CSV file.
I am new to python. I have seen other answers but they are not working as expected.
Can anyone help me.
chunksize= 1000000
query = "select from emp_table;"
cursor = conn.cursor()
cursor.execute(query)
r=[]
As you mentioned, due to high volume data should get loaded in chunks to reduce load, however loading and appending in r could lead to a very memory-intensive task.
To avoid that you could load in chunks, write it and then load the next chunk and follow the same,so that we dont overload memory.
Here is a sample code
import snowflake.connector
import csv
def extract_and_write_data(conn, query, chunksize=1000000,
filename='extracted_data.csv'):
cursor = conn.cursor()
cursor.execute(query)
with open(filename, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(cursor.description) # Write header row
''' your cursor should have all the rows,
and below while loop will fetch data in
each chunksize, first it will fetch 1000000,
then next 1000000 and so on and will write to
database on each fetch.At last when chunk is
less than 1000000, it should break meaning there
is no more row to be written.'''
while True:
chunk = cursor.fetchmany(chunksize)
writer.writerows(chunk)
# Check if there are more rows to fetch
if len(chunk) < chunksize:
break
# Connect to Snowflake
conn = snowflake.connector.connect(
user='your_user',
password='your_password',
account='your_account',
warehouse='your_warehouse',
database='your_database',
schema='your_schema'
)
# SQL query to extract all rows from the table
query = "SELECT * FROM emp_table"
# Extract and write data to CSV
extract_and_write_data(conn, query)
# Close the connection
conn.close()