[SOLVED] Ingest data from snowflake

Ingest data from snowflake

I’m working with a very large dataset that a customer shared with me in Snowflake. The dataset contains more than 1.5 billion rows, and I’m using Azure Data Factory (ADF) with a Copy Activity to extract the data.

However, when I run SELECT *, the query fails with the error:

Statement reached its statement or warehouse timeout of 600 seconds and was canceled.

I’ve confirmed that this is not a permission issue. I also tested fetching just 10 rows, and that worked successfully.

The challenge is that the dataset does not have a primary key (PK) column, so I can’t use logical partitioning to split the data into smaller batches.

My Questions:

Can I extend the 600-second timeout in Snowflake? If so, how?

2.What are alternative ways to efficiently extract such a large dataset using ADF and Snowflake?
1. Are there best practices for handling large queries in Snowflake when working with ADF?

I’m not very familiar with Snowflake, so any guidance on optimizing this process would be appreciated.

Solution

Can I extend the 600-second timeout in Snowflake? If so, how?

You can use STATEMENT_TIMEOUT_IN_SECONDS.

[It] specifies the time, in seconds, after which a running SQL statement (query, DDL, DML, etc.) is canceled by the system.

ALTER WAREHOUSE my_wh SET STATEMENT_TIMEOUT_IN_SECONDS=some_number;

You can also optimize your warehouse by adjusting

WAREHOUSE_SIZE
CLUSTER_COUNT,
CONCURRENCY_LEVEL
QUERY_ACCELERATION

The query acceleration service (QAS) can accelerate parts of the query workload in a warehouse. When it is enabled for a warehouse, it can improve overall warehouse performance by reducing the impact of outlier queries, which are queries that use more resources than the typical query. The query acceleration service does this by offloading portions of the query processing work to shared compute resources that are provided by the service.
QUERY_ACCELERATION_MAX_SCALE_FACTOR

Also, make sure you are using Snowflake V2 connector for ADF. It provides improved native Snowflake support. link