Imagine this scenario in SQL Server 2016: we have to tables A and B
We join A and B, and nothing happens and 1000 rows are returned in min time. But when we want to insert this result set into another table (memory optimized table OR normal table or even a temp table), it takes 10 to 20 seconds to insert.
Any ideas?
UPDATE : Execution plans for normal scenario and memory optimized table added
When a DML statement targets a Memory-Optimized table, the query cannot run in parallel, and the server will employ a serialized plan. So, your first statement runs in a single-core mode.
In the second instance, the DML statement leverages the fact that "SELECT INTO / FROM" is parallelizable. This behavior was added in SQL Server 2014. Thus, you get a parallel plan for that. Here is some information about this:
Reference: What's New (Database Engine) - SQL Server 2014
I have run into this problem countless times with Memory-Optimized targets. One solution I have found, if the I/O requirements are high on the retrieval, is to stage the result of the SELECT statement into a temporary table or other intermediate location, then insert from there into the Memory-Optimized table.
The third issue is that, by default, statements that merely read from a Memory-Optimized table, even if that table is not the target of DML, are also run in serialized fashion. There is a hotfix for this, which you can enable with a query hint.
The hint is used like this:
OPTION(HINT USE ('ENABLE_QUERY_OPTIMIZER_HOTFIXES'))
Reference: Update enables DML query plan to scan query memory-optimized tables in parallel in SQL Server 2016
In either case, any DML that has a memory-optimized table as a target is going to run on a single core. This is by design. If you need to exploit parallelism, you cannot do it if the Memory-Optimized table is the target of the statement. You will need to benchmark different approaches to find the one that performs best for your scenario.