asp.netsql-servernetwork-programmingsitecoresitecore8.1

Intermittent SQL connection error


I have an ASP.NET (Sitecore) application, and the logs are showing intermittent SQL connection errors in our customer's production environment. The exception is as follows:

Exception: System.Data.SqlClient.SqlException
Message: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)
Source: .Net SqlClient Data Provider
   at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, DbConnectionPool pool, String accessToken, Boolean applyTransientFaultHandling)
   at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.Open()
   at Sitecore.Data.DataProviders.Sql.DataProviderCommand..ctor(IDbCommand command, DataProviderTransaction transaction, Boolean openConnection)
   at Sitecore.Data.DataProviders.Sql.SqlDataApi.<>c__DisplayClass4.<CreateCommand>b__3()
   at Sitecore.Data.DataProviders.NullRetryer.Execute[T](Func`1 action, Action recover)
   at Sitecore.Data.DataProviders.Sql.SqlDataApi.<>c__DisplayClass12.<CreateReader>b__10()
   at Sitecore.Data.DataProviders.NullRetryer.Execute[T](Func`1 action, Action recover)
   at Sitecore.Data.DataProviders.Sql.SqlDataApi.CreateReader(String sql, Object[] parameters)
   at Sitecore.Data.DataProviders.Sql.SqlDataApi.<CreateObjectReader>d__6`1.MoveNext()
   at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
   at Sitecore.Eventing.EventQueue.ProcessEvents(Action`2 handler)
   at Sitecore.Eventing.EventProvider.RaiseQueuedEvents()

Nested Exception

Exception: System.ComponentModel.Win32Exception
Message: The network path was not found


6420 16:53:53 ERROR Exception processing remote events from database: web
Exception: System.Data.SqlClient.SqlException
Message: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)
Source: .Net SqlClient Data Provider
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.Open()
   at Sitecore.Data.DataProviders.Sql.DataProviderCommand..ctor(IDbCommand command, DataProviderTransaction transaction, Boolean openConnection)
   at Sitecore.Data.DataProviders.Sql.SqlDataApi.<>c__DisplayClass4.<CreateCommand>b__3()
   at Sitecore.Data.DataProviders.NullRetryer.Execute[T](Func`1 action, Action recover)
   at Sitecore.Data.DataProviders.Sql.SqlDataApi.<>c__DisplayClass12.<CreateReader>b__10()
   at Sitecore.Data.DataProviders.NullRetryer.Execute[T](Func`1 action, Action recover)
   at Sitecore.Data.DataProviders.Sql.SqlDataApi.CreateReader(String sql, Object[] parameters)
   at Sitecore.Data.DataProviders.Sql.SqlDataApi.<CreateObjectReader>d__6`1.MoveNext()
   at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
   at Sitecore.Eventing.EventQueue.ProcessEvents(Action`2 handler)
   at Sitecore.Eventing.EventProvider.RaiseQueuedEvents()

Nested Exception

Exception: System.ComponentModel.Win32Exception
Message: The network path was not found

This error is typical if you have provided an invalid connection string where the server you have specified doesn't exist or isn't accessible on the network. However, the site functions fine 99% of the time.

In this example the error is coming from Sitecore's RaiseQueuedEvents scheduled task, but the exception is also raised from various other places, including when hitting a URL in the site, resulting in a http 500.

The interesting thing is that they come in waves, i.e. there could be up to 100 of these exceptions within the space of a couple of seconds.

Our customer's infrastructure team who manage the servers are pretty adamant that it's not a network issue, and is something wrong with the application code, and have reported that there is an increase in DB traffic at the time these exceptions seem to occur:

(All these are CLEAR SPIKES compared to usual performance) - at 10:10:14 – number if user connections increased from 60 to 90 - at 10:10:14 – number of “batch requests / s” increased from around 60 to 650 - at 10:10:32 – “disk avg. READ time” increased from 1ms to 8.4ms - at 10:10:32 – network utilisation spiked from 0.3% to 18%

There were NO network drops registered by the sql monitor, there was NO impact on the server CPU utilisation.

I'm not an expert in networking or SQL performance but to me, these stats don't seem unreasonable, or would be a cause for subsequent connection attempts to receive the 'network path not found' exception; if the server was busy, I'd expect to receive a timeout exception?

I've contacted Sitecore support who have swiftly suggested it is a network issue:

Based on these exceptions it doesn't seem they are Sitecore related. Messages clearly state that you have some kind of network error, so it will be appropriate to investigate further along with your Infrastructure team. I reviewed similar issues in our database and can highlight the following areas. - Remote connection was forcibly closed/disabled - Server was offline - Something related to wrong security context. Firewall and antiviruses may affect that.

Currently, we're at loggerheads; my feeling being that the error message suggests it must be a networking issue, but with their team of the opinion the site is broken in some way.

How can I diagnose what the problem is? Could there be some issue with the code/sitecore or is this a networking issue?

Update: Network details

The database server is hosted on a different network and is networked via a VLAN I believe. The servers are load balanced, which * I think * may be done using a firewall rather than a proper load balancer.

UPDATE 2

The problem was that SQL was configured to allow both TCP and Named Pipes. Sometimes it would try and connect with the latter which doesn't use the standard SQL port. The solution was to prefix the data source/server in the connection string with Data Source=tcp:xxx.xxx.xx.xxx to always ensure it connects via TCP


Solution

  • It wasn't related to Sitecore, but I have previously seen a very similar situation with another Content Management system. I also had a similar challenge with infrastructure staff being convinced the database server was fine, and that the problem was with the website.

    I suspected a network issue, so my approach was:

    1. I wrote a SQL script which could be run from the command line and would show up the same connectivity issue.

    2. I ran that script on a web server and I ran the script on the database server itself. I recorded the outcomes and compared them.

    My tests showed that the error could not be reproduced at all when the script ran on the DB server, but that it did occur when run from the web server's command line. That was evidence to back up my hunch that the issue was connectivity related, and did not relate to either the website or the database server.

    That focused attention on the setup of the firewall dividing the website's DMZ from the internal network the SQL server was on. This firewall was a load-balanced pair - and we were eventually able to find a subtle configuration difference which was causing the second box to terminate connections which had been started through the first box.

    It seems unlikely that that specific issue would be causing the problem for you - but you may find the overall approach of coming up with tests which can help localise the cause of the problem helpful?