snowflake-cloud-data-platform

How to Process Large Size Data Using Multiple Threads In Snowflake Procedure


my requirement is ; I have large size of data available in Snowflake and i would like to process the data using multi thread in JavaScript Based Procedure; the data available is independent of each other; basically there is no inter dependency ; so in order to reduce the processing time i would like to process this data in multiple threads. Do you know how this can be done using multi thread option in JavaScript.


Solution

  • Unfortunately the stored procedures API for JavaScript doesn't support async queries or multi-threading (for now).

    An alternative is to use a tree of Snowflake tasks - as one task can have multiple dependents those will be executed in parallel.

    Here's the proof with a minimal example:

    In this example I created two queries that take 10 seconds to run each. By having both as parallel tasks in a tree, you can see that they executed at the same time:

    enter image description here

    You can see that both queries took 10 seconds to produce 5 billion rows each, and then inserted a new row within the same sub-second.