I am trying to find out why our SQL server suddenly hanged and It became normal after restart
Symptoms:
Java threads started getting stuck, figured out that the Java JDBC connections started hanging without any response from DB which caused threads to stuck
All connections (around 100 ) were active until SQL server was restarted. Finally, DB connections were closed by DB after restart. Java JDBC connection received 'Connection reset by peer' AFTER DB was restarted
Impact duration : 5 hours (until restart)
Tech stack:
Limitation: Team had restarted the SQL server before we can export any statistics from DB and even doubtful whether we were able to run statistic SQL queries before SQL server restarted as the DB was hanged already
Findings/actions: After DB was restarted, I tried to extract statistics from dm_exec_query_stats, however, it tracks queries based on last run time only. There was no result for affected period. Same scenario for dm_os_waiting_tasks as well. Server team say that the CPU and Memory usage were normal (I still to receive complete report)
Could see no error/problem from Windows event log and cluster logs. They look normal
Some Google sources say that some queries may consume complete CPU which may make SQL server to hang, some others say that some queries might have made blocking.
It may look simple or common for SQL server experts/DBA, however, I have googled for finding out relevant issue and resolution, however they doesn't seem to help
Tried these queries but no joy
SELECT deqs.last_execution_time AS [Time], dest.TEXT AS [Query], dbid, deqs.*
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
where deqs.last_execution_time between '2020-09-29 13:16:52.710' and '2020-09-29 23:16:52.710'
ORDER BY deqs.last_execution_time DESC ;
SELECT
qs.sql_handle,
qs.execution_count,
qs.total_worker_time AS Total_CPU,
total_CPU_inSeconds = --Converted from microseconds
qs.total_worker_time/1000000,
average_CPU_inSeconds = --Converted from microseconds
(qs.total_worker_time/1000000) / qs.execution_count,
qs.total_elapsed_time,
total_elapsed_time_inSeconds = --Converted from microseconds
qs.total_elapsed_time/1000000,
st.text,qs.query_hash,
qp.query_plan
FROM
sys.dm_exec_query_stats AS qs
CROSS APPLY
sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY
sys.dm_exec_query_plan (qs.plan_handle) AS qp
where qs.last_execution_time between '2020-09-29 13:16:52.710' and '2020-09-29 23:16:52.710'
ORDER BY
qs.total_worker_time DESC;
--View waiting tasks per connection
SELECT st.text AS [SQL Text], c.connection_id, w.session_id,
w.wait_duration_ms, w.wait_type, w.resource_address,
w.blocking_session_id, w.resource_description, c.client_net_address, c.connect_time
FROM sys.dm_os_waiting_tasks AS w
INNER JOIN sys.dm_exec_connections AS c ON w.session_id = c.session_id
CROSS APPLY (SELECT * FROM sys.dm_exec_sql_text(c.most_recent_sql_handle)) AS st
WHERE w.session_id > 50 AND w.wait_duration_ms > 0
ORDER BY c.connection_id, w.session_id
GO
-- View waiting tasks for all user processes with additional information
SELECT 'Waiting_tasks' AS [Information], owt.session_id,
owt.wait_duration_ms, owt.wait_type, owt.blocking_session_id,
owt.resource_description, es.program_name, est.text,
est.dbid, eqp.query_plan, er.database_id, es.cpu_time,
es.memory_usage*8 AS memory_usage_KB
FROM sys.dm_os_waiting_tasks owt
INNER JOIN sys.dm_exec_sessions es ON owt.session_id = es.session_id
INNER JOIN sys.dm_exec_requests er ON es.session_id = er.session_id
OUTER APPLY sys.dm_exec_sql_text (er.sql_handle) est
OUTER APPLY sys.dm_exec_query_plan (er.plan_handle) eqp
WHERE es.is_user_process = 1
ORDER BY owt.session_id;
GO;
The problem was resolved
Root cause: DB session was not committed after update query. Hence all other threads (accessing same row of the table) started to wait for locks which consumed all threads.