Snowflake recently released a new function UNION [ALL] BY NAME, which is exciting! While I imagine it's minor, I'm wondering if this works just as performantly as UNION [ALL], or if there's an extra step the processor has to take that would make adding BY NAME less performant
I'm expecting that it does take longer, but by not enough to be confident in my quick side-by-side comparison testing
Feel completely confident using UNION BY NAME
and UNION ALL BY NAME
. There's an extra step, but this's just metadata manipulation during the incredibly fast query compilation phase.
There's 2 main stages of a query's life:
Compilation: parsing your query, validates it and generates an optimized execution plan. This phase is extremely fast
Execution: this's where the virtual warehouse actually executes the plan: scanning micro-partitions, shuffling data, performing joins and aggregating results. This's where the vast majority of query time is always spent
The extra work of UNION BY NAME
/ UNION ALL BY NAME
in compilation phase includes the name resolution, mapping and potential restructuring of the query plan adds a small amount of logical overhead. But the performance difference is really academic, not practical, so it should not be a factor in your decision to use it. The benefits of code's quality, data integrity and maintainability provided by these new released functions far outweigh this technical performance cost. It's not something you'll ever notice in practice. Always use the query profile to find the real source of the slowdown.
Think of it like analogy of planning a road trip with GPS navigator:
UNION
: you provide a simple list of coordinates in order. Planning is instant
UNION BY NAME
: you provide a list of landmark names ("Eiffel Tower," "Louvre"). The navigator has some compilation overhead looking up the coordinates for those names, validating and putting them in the right order
In both cases the actual execution phase is exactly the same once the route is planned. The extra few seconds of planning are insignificant compared to a multi-hour drive.