sqlsnowflake-cloud-data-platform

Is Snowflake's UNION BY NAME more computationally expensive than UNION?


Snowflake recently released a new function UNION [ALL] BY NAME, which is exciting! While I imagine it's minor, I'm wondering if this works just as performantly as UNION [ALL], or if there's an extra step the processor has to take that would make adding BY NAME less performant

I'm expecting that it does take longer, but by not enough to be confident in my quick side-by-side comparison testing


Solution

  • Feel completely confident using UNION BY NAME and UNION ALL BY NAME. There's an extra step, but this's just metadata manipulation during the incredibly fast query compilation phase.

    There's 2 main stages of a query's life:

    1. Compilation: parsing your query, validates it and generates an optimized execution plan. This phase is extremely fast

    2. Execution: this's where the virtual warehouse actually executes the plan: scanning micro-partitions, shuffling data, performing joins and aggregating results. This's where the vast majority of query time is always spent

    The extra work of UNION BY NAME / UNION ALL BY NAME in compilation phase includes the name resolution, mapping and potential restructuring of the query plan adds a small amount of logical overhead. But the performance difference is really academic, not practical, so it should not be a factor in your decision to use it. The benefits of code's quality, data integrity and maintainability provided by these new released functions far outweigh this technical performance cost. It's not something you'll ever notice in practice. Always use the query profile to find the real source of the slowdown.

    Think of it like analogy of planning a road trip with GPS navigator:

    In both cases the actual execution phase is exactly the same once the route is planned. The extra few seconds of planning are insignificant compared to a multi-hour drive.