I have an SSIS (SQL Server Integration Services) package in Visual Studio that contains a Data Flow Task. The Data Flow Task reads data from an OLE DB Source and loads it into an OLE DB Destination. The OLE DB Source is connected to a view in my database, and the SELECT statement for that view takes around 3 minutes and 20 seconds to execute.
The issue I'm facing is that when I try to open the OLE DB Source component in Visual Studio, it takes an extremely long time, sometimes over an hour, to load. This makes it very frustrating to edit the source and work on my SSIS package.
I have tried a few things to resolve this issue, but none of them have provided a satisfactory solution.
At this point, I'm not sure what else I can do to address this issue. I'm open to any suggestions or advice on how to improve the loading time when opening the OLE DB Source component in Visual Studio.
Is there anything else I can try to speed up the loading time? Are there any additional settings or optimizations that I might have missed? Any guidance would be greatly appreciated.
I'm facing the same issue as the OP and found the underlying cause in my case.
Summary
Before finding this post and diving deeper in the investigation, I simply updated my query with TOP 1 while developing the ETL. Although, @Tom Thomson's workaround of using a SQL Command for the Data access mode and prefixing it with SET ROWCOUNT 0
works and is less of an assle, investigation and testing showed that the SET ROWCOUNT is simply not needed. The simple fact of using SQL Command is enough for SSIS to issue a SP_PREPARE command instead of the SELECT * query and retrieve the metadata quickly.
So the shortest answer if you're having an issue with a long loading time when opening an OLE DB data source is: Use 'SQL Command' instead of 'Table or view'.
For more details, continue reading.
Investigation details
But to understand the issue I did look at the activities behind the validation. It's executing:
SET ROWCOUNT 1
SELECT * FROM myView
It's not performing the sys.dm_exec_describe_first_result_set
query as stated by another user.
The SSIS default method works quite well with most of the tables and views I've tested. However, I'm having issues with my recent views.
Without ROWCOUNT my view takes 1 second, with it, it takes forever (I cancelled before completing). The SET ROWCOUNT 1 is affecting the execution plan.
Without ROWCOUNT no steps went over 100% of planned rows to retrieve and matches the number of rows in the underlying tables or less.
With it, there seems to be more steps and the expected rows to retrieve is very low but ends up retrieving all the table rows or a lot more then the number of rows that exist in the table/index specified in a step.
But the biggest difference is that, with ROWCOUNT, a Table Spool is inserted in the plan and just keep chugging in rows:
Without ROWCOUNT, there's no Table Spool in the plan and no excessive row processing:
With that information I removed the Linked Server part of my query and tested again and this time both executions returned quickly, with and without ROWCOUNT.
I have other complex queries as source and didn't have issues with them in the past, and did remember seeing SSIS performing sys.dm_exec_describe_first_result_set
commands so I took a look into them and noticed that it's because I'd wrapped them in Stored Procedure (SP) and use SQL command as the Data access mode in my OLE DB Source. The validation step issues a sp_prepare
statement on the SP followed by a sys.dm_exec_describe_first_result_set
command. I embedded my problematic query in a SP and it worked just fine. It loaded very quickly.
After further testing, I found out that the underlying reason it works well with a SP is simply because SSIS issues a SP_PREPARE statement when using ANY SQL Command as the Data access mode. Therefore, the other answer that adds SET ROWCOUNT 0 is not totally accurate, as that prefix is simply not needed.
With that, I'm wondering why, if other more effecient methods like sys.dm_exec_describe_first_result_set
or sp_prepare
exists and works to retrieve the metadata, SSIS uses SET ROWCOUNT 1 to retrieve the information when "Table or view" is selected?