javasql-serverhibernatesql-server-2016

SQL server hanged suddenly - all DB connections are active but no response


I am trying to find out why our SQL server suddenly hanged and It became normal after restart

Symptoms:

Impact duration : 5 hours (until restart)

Tech stack:

Limitation: Team had restarted the SQL server before we can export any statistics from DB and even doubtful whether we were able to run statistic SQL queries before SQL server restarted as the DB was hanged already

Findings/actions: After DB was restarted, I tried to extract statistics from dm_exec_query_stats, however, it tracks queries based on last run time only. There was no result for affected period. Same scenario for dm_os_waiting_tasks as well. Server team say that the CPU and Memory usage were normal (I still to receive complete report)

Could see no error/problem from Windows event log and cluster logs. They look normal

Some Google sources say that some queries may consume complete CPU which may make SQL server to hang, some others say that some queries might have made blocking.

It may look simple or common for SQL server experts/DBA, however, I have googled for finding out relevant issue and resolution, however they doesn't seem to help

Tried these queries but no joy

    SELECT deqs.last_execution_time AS [Time], dest.TEXT AS [Query], dbid, deqs.*
    FROM sys.dm_exec_query_stats AS deqs
    CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
    where deqs.last_execution_time between '2020-09-29 13:16:52.710' and '2020-09-29 23:16:52.710'
     ORDER BY deqs.last_execution_time DESC ;



         SELECT 
        qs.sql_handle,
        qs.execution_count,
        qs.total_worker_time AS Total_CPU,
        total_CPU_inSeconds = --Converted from microseconds
            qs.total_worker_time/1000000,
        average_CPU_inSeconds = --Converted from microseconds
            (qs.total_worker_time/1000000) / qs.execution_count,
        qs.total_elapsed_time,
        total_elapsed_time_inSeconds = --Converted from microseconds
            qs.total_elapsed_time/1000000,
        st.text,qs.query_hash,
        qp.query_plan
    FROM
        sys.dm_exec_query_stats AS qs
    CROSS APPLY 
        sys.dm_exec_sql_text(qs.sql_handle) AS st
    CROSS APPLY
        sys.dm_exec_query_plan (qs.plan_handle) AS qp
        where qs.last_execution_time between '2020-09-29 13:16:52.710' and '2020-09-29 23:16:52.710'
    ORDER BY 
        qs.total_worker_time DESC;

--View waiting tasks per connection
SELECT st.text AS [SQL Text], c.connection_id, w.session_id, 
  w.wait_duration_ms, w.wait_type, w.resource_address, 
  w.blocking_session_id, w.resource_description, c.client_net_address, c.connect_time
FROM sys.dm_os_waiting_tasks AS w
INNER JOIN sys.dm_exec_connections AS c ON w.session_id = c.session_id 
CROSS APPLY (SELECT * FROM sys.dm_exec_sql_text(c.most_recent_sql_handle)) AS st 
              WHERE w.session_id > 50 AND w.wait_duration_ms > 0
ORDER BY c.connection_id, w.session_id
GO

-- View waiting tasks for all user processes with additional information
SELECT 'Waiting_tasks' AS [Information], owt.session_id,
    owt.wait_duration_ms, owt.wait_type, owt.blocking_session_id,
    owt.resource_description, es.program_name, est.text,
    est.dbid, eqp.query_plan, er.database_id, es.cpu_time,
    es.memory_usage*8 AS memory_usage_KB
FROM sys.dm_os_waiting_tasks owt
INNER JOIN sys.dm_exec_sessions es ON owt.session_id = es.session_id
INNER JOIN sys.dm_exec_requests er ON es.session_id = er.session_id
OUTER APPLY sys.dm_exec_sql_text (er.sql_handle) est
OUTER APPLY sys.dm_exec_query_plan (er.plan_handle) eqp
WHERE es.is_user_process = 1
ORDER BY owt.session_id;
GO;

Solution

  • The problem was resolved

    Root cause: DB session was not committed after update query. Hence all other threads (accessing same row of the table) started to wait for locks which consumed all threads.