I recently experienced a blocking issue in a production environment which had been going on for hours. As a responsible DBA, I tried to kill the head of the blocking chain. Unexpectedly, killing the session seemed to have no effect. The blocking continued and the session seemed to stick around.
How to Create an Unkillable Session
It’s surprisingly easy to create a session that seemingly can’t be killed. Use good judgment in picking an environment to run this code against. First you’ll need to enable the “Ad Hoc Distributed Queries” configuration option if you haven’t done so already:
sp_configure 'Ad Hoc Distributed Queries', 1; RECONFIGURE; GO sp_configure;
You can then run the following code after replacing the SERVER_NAME and INSTANCE_NAME placeholder values with the server name and instance name that you’re running against. In other words, you want the OPENROWSET call to point at the same server that you’re executing the query against:
BEGIN TRANSACTION; SELECT TOP (1) [name] FROM master.dbo.spt_values WITH (TABLOCKX); SELECT TOP (1) d.[name] FROM OPENROWSET('SQLNCLI', '{{SERVER_NAME}}\{{INSTANCE_NAME}};Trusted_Connection=yes;', master.dbo.spt_values) AS d;
This code seems to run forever. Trying to kill the session is not effective, even though there’s no error message and a line written to the error log claiming that the session was killed. Running kill with the STATUSONLY option gives us the following:
SPID 56: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds.
Trying to cancel the query using SSMS also seems to have no effect as well.
The Undetected Deadlock
Running sp_whoisactive reveals a pretty alarming state:
Session 56 (the one that I can see in SSMS) is waiting on an OLEDB wait. Based on the documentation, I assume that you’ll get this wait type while waiting for the OPENROWSET call to complete:
Includes all connection information that is required to access remote data from an OLE DB data source. This method is an alternative to accessing tables in a linked server and is a one-time, ad hoc method of connecting and accessing remote data by using OLE DB.
I did get callstacks for the OLEDB wait using the wait_info_external extended event, but there were two problems. The first problem is that wait_info_external fires after the wait is complete and the OLEDB wait won’t finish. The second problem is that the callstacks that I did get weren’t very interesting.
Getting back on topic, I believe that the OLEDB wait for session 56 can only end once session 66 completes its work. Session 66 is the session that was created by the OPENROWSET call. However, session 66 needs an IS lock on the spt_values table. That is not compatible with the exclusive lock held on that object by session 56. So session 56 is waiting on session 66 but session 66 is waiting on session 56. This is an undetected deadlock. The deadlock can be resolved by killing session 66 (the OPENROWSET query).
Note that if you are crazy enough to try this on your own, you may see a MEMORY_ALLOCATION_EXT wait instead of an OLEDB wait. I don’t believe that there’s any meaningful distinction there, but don’t assume that a long running OLEDB wait is required for this problem to occur.
Final Thoughts
This blocking issue was unusual in that killing the blocking session doesn’t resolve the issue. I had to kill the session getting blocked instead. Thanks for reading!
hmmm. i’ll hafta try this later.
i wonder if, like workers sometimes hanging around after CHECKDB is terminated, a DBCC INPUTBUFFER can be enough of a kick in the shins to get the session to self-terminate.