SS 2022
CU 13
Greetings. Last weekend we moved to a new SQL server, and last night didnt go so well. When I logged in this morning we had LOTS of blocking from spid -5. I can't find much info on this but did find this link that shares this detail:
In short, if you see a block on session id -5, don’t focus on the block – focus on the wait type that the session was encountering. In the example above fixing slow disk IO would help alleviate the performance problem. To be honest, I’m not wild that Microsoft introduced this – it is misleading and will have people chasing red herrings.
We did see some solid disk usage (wait types were disk related), but honestly I don't think thats the deal, and neither do our sysadmins.
Anyways, this blocking lasted for a few more hours. Of course it couldnt be killed as it wasnt really active, and we finally resorted to a reboot which fixed the issue. Unfortunately on my new server I do have a trace for any query taking >= 25 seconds, but it was configured incorrectly so I cannot see what queries kicked off right before these message started. I do know that the server didn't have enough RAM, but again am not inclined to look at the hardware at this point.
Here's some more info for your viewing pleasure, any ideas are appreciated;
--Im thinking symptom not cause. We got this for all 8 tempDB files:
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [P:\tempDB\tempdb_mssql_8.ndf] in database id 2. The OS file handle is 0x00000000000011BC. The offset of the latest long I/O is: 0x000000d419a000. The duration of the long I/O is: 26243 ms.
--LOTS of these, new to me:
Message
<frame id="00" address="0x7FF801D45B75" pdb="sqlmin.pdb" age="2" guid="FA28D090-6D0F-402A-B557-0C7B2D60C0EA" module="sqlmin" rva="0x1475B75" />
Message
A time-out occurred while waiting for buffer latch -- type 2, bp 000001DC1841F480, page 3:1022464, stat 0x10f, database id: 2, allocation unit Id: 6488064, task 0x000001C202FAA108 : 8, waittime 300 seconds, flags 0x3a, owning task 0x000001D51BFA1088. Not continuing to wait.
Message
Latch Owner (No frames may indicate task has exited.)
--this one does suggest RAM
Message
Error: 833, Severity: 10, State: 1. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.