snowflake-cloud-data-platformazure-data-factory

Ingest data from snowflake


I’m working with a very large dataset that a customer shared with me in Snowflake. The dataset contains more than 1.5 billion rows, and I’m using Azure Data Factory (ADF) with a Copy Activity to extract the data.

However, when I run SELECT *, the query fails with the error:

Statement reached its statement or warehouse timeout of 600 seconds and was canceled.

I’ve confirmed that this is not a permission issue. I also tested fetching just 10 rows, and that worked successfully.

The challenge is that the dataset does not have a primary key (PK) column, so I can’t use logical partitioning to split the data into smaller batches.

My Questions:

  1. Can I extend the 600-second timeout in Snowflake? If so, how?

    2.What are alternative ways to efficiently extract such a large dataset using ADF and Snowflake?

    1. Are there best practices for handling large queries in Snowflake when working with ADF?

I’m not very familiar with Snowflake, so any guidance on optimizing this process would be appreciated.


Solution

  • Can I extend the 600-second timeout in Snowflake? If so, how?

    You can use STATEMENT_TIMEOUT_IN_SECONDS.

    [It] specifies the time, in seconds, after which a running SQL statement (query, DDL, DML, etc.) is canceled by the system.

    ALTER WAREHOUSE my_wh SET STATEMENT_TIMEOUT_IN_SECONDS=some_number;
    

    You can also optimize your warehouse by adjusting

    Also, make sure you are using Snowflake V2 connector for ADF. It provides improved native Snowflake support. link