Black screen and system crash after wake from suspend state (bug check help)

Lucide 1 Reputation point
2024-07-23T22:57:20.3933333+00:00

For the last three weeks I've been experiencing system crashes approximately once every 3-4 wakes from suspend. The pc goes to sleep normally, but when it's woken up it shows a blank black screen for around 10 minutes, after which it reboots and a MEMORY.DMP is generated in the system folder.

I've found no interesting entries in the event viewer, just the usual warning and errors from the unexpected restart.

My issue seems really similar to this one, but in my case regarding suspension, not hibernation.

I've tried examining the dumps myself following the answer from @Gary Nebbett and the referenced blogpost, but my dumps are different enough that I got stuck. I believe I can call myself a C programmer but debugging a windows memory dump in WinDbg is definitely outside my expertise 🥺.

Here's what I got: WinDbg.txt

I haven't found "Lock" mentions in the thread traces I've examined, but again I'm not really sure of what I'm doing..

Here's a OneDrive link with three dumps of three separate incidents.

Additional notes:

  • Windows 10 Pro 22H2 19045.4651, Nvidia GPU
  • No third party security products
  • EpicWebHelper.exe is present in the system but not in execution during all these incidents.
  • Some portable software might be running from a non-system hdd (ntfs junction from C: to D:). A search for cef gave no results tho. I know some non-background tools use it.
  • I keep Playnite running in the background in the system ssd, which uses CefSharp
Windows 10
Windows 10
A Microsoft operating system that runs on personal computers and tablets.
11,500 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Gary Nebbett 6,066 Reputation points
    2024-07-29T08:29:44.48+00:00

    Hello Lucide,

    Power suspend/resume processing goes through various phases; when resuming power, the three main phases are WakeDevices, ResumeServices and ResumeApps. The first two phases complete successfully but a deadlock is occurring during the ResumeApps phase.

    Here are the two threads involved (the situation is the same in all three dumps; this data is taken from the first dump):

    THREAD ffff998bb05d7040  Cid 0004.1418  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrPushLock) KernelMode Non-Alertable
        fffff400706c1e70  SynchronizationEvent
    IRP List:
        ffff998bb11fb4b0: (0006,0118) Flags: 00000000  Mdl: 00000000
    Not impersonating
    DeviceMap                 ffffab0963247b40
    Owning Process            ffff998b834a0040       Image:         System
    Attached Process          N/A            Image:         N/A
    Wait Start TickCount      1478860        Ticks: 19039 (0:00:04:57.484)
    Context Switch Count      701            IdealProcessor: 10  NoStackSwap
    UserTime                  00:00:00.000
    KernelTime                00:00:00.031
    Win32 Start Address nt!ExpWorkerThread (0xfffff801816c8fc0)
    Stack Init fffff400706c2b90 Current fffff400706c1a80
    Base fffff400706c3000 Limit fffff400706bc000 Call 0000000000000000
    Priority 15 BasePriority 12 PriorityDecrement 0 IoPriority 2 PagePriority 5
    Child-SP          RetAddr               Call Site
    fffff400`706c1ac0 fffff801`81638a30     nt!KiSwapContext+0x76
    fffff400`706c1c00 fffff801`81637f5f     nt!KiSwapThread+0x500
    fffff400`706c1cb0 fffff801`81637803     nt!KiCommitThreadWait+0x14f
    fffff400`706c1d50 fffff801`816f2b50     nt!KeWaitForSingleObject+0x233
    fffff400`706c1e40 fffff801`8163cc12     nt!ExfAcquirePushLockExclusiveEx+0x1a0
    fffff400`706c1ef0 fffff801`9bbc2a76     nt!ExAcquirePushLockExclusiveEx+0x1a2
    fffff400`706c1f30 fffff801`9bbc810a     dxgkrnl!DXGFASTMUTEX::Acquire+0xb6
    fffff400`706c1f60 fffff801`9bd37419     dxgkrnl!EXCLUSIVEACCESS<VIDPN_MGR>::EXCLUSIVEACCESS<VIDPN_MGR>+0x1e
    fffff400`706c1f90 fffff801`9bd37aac     dxgkrnl!VIDPN_MGR::_MonitorEventHandler+0x109
    fffff400`706c2000 fffff801`9bd4a02b     dxgkrnl!MONITOR_MGR::_IssueMonitorEvent+0x1e0
    fffff400`706c20e0 fffff801`9bdbf143     dxgkrnl!DXGMONITOR::_UpdateEDIDBaseBlock+0x25b
    fffff400`706c2160 fffff801`9bd4c255     dxgkrnl!DXGMONITOR::_OnMonitorDeviceNodeReady+0x7647f
    fffff400`706c21a0 fffff801`9bd25a0a     dxgkrnl!MonitorNotifyDeviceNodeReady+0x139
    fffff400`706c22a0 fffff801`9bd1e16a     dxgkrnl!DpiPdoDispatchPnp+0x23a
    fffff400`706c2340 fffff801`afc69a78     dxgkrnl!DpiDispatchPnp+0xea
    fffff400`706c2460 fffff801`8162d3f5     nvlddmkm+0x1519a78
    fffff400`706c2580 fffff801`81a78bb8     nt!IofCallDriver+0x55
    fffff400`706c25c0 fffff801`8177f5fb     nt!IopSynchronousCall+0xf8
    fffff400`706c2630 fffff801`81b4390d     nt!PnpIrpDeviceEnumerated+0x3f
    fffff400`706c26c0 fffff801`81b3ffb4     nt!PiProcessNewDeviceNode+0xa4d
    fffff400`706c2890 fffff801`81b4b4cc     nt!PipProcessDevNodeTree+0x380
    fffff400`706c2960 fffff801`81770156     nt!PiProcessReenumeration+0x88
    fffff400`706c29b0 fffff801`816c90c5     nt!PnpDeviceActionWorker+0x206
    fffff400`706c2a70 fffff801`81748da5     nt!ExpWorkerThread+0x105
    fffff400`706c2b10 fffff801`81806de8     nt!PspSystemThreadStartup+0x55
    fffff400`706c2b60 00000000`00000000     nt!KiStartSystemThread+0x28
    
    

    and

    THREAD ffff998ba24a9080  Cid 06c8.1f20  Teb: 0000008614c3a000 Win32Thread: ffff998ba16349a0 WAIT: (WrResource) KernelMode Non-Alertable
        fffff4006fb5e848  SynchronizationEvent
    Not impersonating
    DeviceMap                 ffffab096c36ae70
    Owning Process            ffff998ba1b62080       Image:         dwm.exe
    Attached Process          N/A            Image:         N/A
    Wait Start TickCount      1497875        Ticks: 24 (0:00:00:00.375)
    Context Switch Count      457            IdealProcessor: 8             
    UserTime                  00:00:00.031
    KernelTime                00:00:00.109
    Win32 Start Address 0x00007fffb499d110
    Stack Init fffff4006fb5fb90 Current fffff4006fb5e390
    Base fffff4006fb60000 Limit fffff4006fb59000 Call 0000000000000000
    Priority 15 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
    Child-SP          RetAddr               Call Site
    fffff400`6fb5e3d0 fffff801`81638a30     nt!KiSwapContext+0x76
    fffff400`6fb5e510 fffff801`81637f5f     nt!KiSwapThread+0x500
    fffff400`6fb5e5c0 fffff801`81637803     nt!KiCommitThreadWait+0x14f
    fffff400`6fb5e660 fffff801`8163404d     nt!KeWaitForSingleObject+0x233
    fffff400`6fb5e750 fffff801`8163e63a     nt!ExpWaitForResource+0x6d
    fffff400`6fb5e7d0 fffff801`8163e0a4     nt!ExpAcquireResourceSharedLite+0x4da
    fffff400`6fb5e890 fffff801`9bceb764     nt!ExAcquireResourceSharedLite+0x44
    fffff400`6fb5e8d0 fffff801`9bbda062     dxgkrnl!<lambda_3a429c02e21bb855f1ec386a1cface2b>::operator()+0x440
    fffff400`6fb5f1a0 fffff801`9bcf189d     dxgkrnl!<lambda_3dc479c6339d8ea3367aebfddfa054a6>::<lambda_invoker_cdecl>+0x12
    fffff400`6fb5f1d0 fffff801`9bcf00d3     dxgkrnl!DXGGLOBAL::IterateAdaptersWithCallback+0x191
    fffff400`6fb5f250 fffff801`9bcefea9     dxgkrnl!DxgkDDisplayEnumCore+0x4b
    fffff400`6fb5f2a0 fffff801`9bcee94b     dxgkrnl!DxgkDDisplayEnumInternal+0x1b9
    fffff400`6fb5f9d0 fffff801`81811d05     dxgkrnl!DxgkDDisplayEnum+0xb
    fffff400`6fb5fa00 00007fff`b2644984     nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ fffff400`6fb5fa00)
    00000086`14efee38 00000000`00000000     0x00007fff`b2644984
    
    

    they are linked by an ERESOURCE and a EX_PUSH_LOCK. Here is is ERESOURCE:

    Resource @ 0xffff998ba5eb0138    Exclusively owned
        Contention Count = 1
        NumberOfSharedWaiters = 1
         Threads: ffff998bb05d7040-01<*> ffff998ba24a9080-01
    
    

    It is owned by one of the threads and the other thread is waiting to acquire it. However the owing thread is waiting to acquire an EX_PUSH_LOCK but this is held by the other thread (push lock ownership is not recorded, so I can't show this but I am confident that it is the case).

    Obviously, not acquiring "locks" in a defined order is a classic bug, but some other factor must be present to "trigger" the latent bug.

    Some more troubleshooting will be necessary to identify the trigger; the stacks suggest that it might be something to do with the monitor/display. Event Tracing for Windows might help us to understand what is happening. The stack trace of the second thread shown above ends with the transition from user-mode; the user-mode portion of the stack is not present in the dump but might have been useful to better understand why displays were being enumerated.

    The first step would be to consider whether anything related to the monitor/display changed near the time that the problem started to occur; we can then think about next steps.

    BTW, I just cycled the "local" section (as judged from Basel, Switzerland) of EV5 - aiming for Brindisi in summer would probably have been a fatal misjudgement of my capabilities :-)

    Gary


  2. Lucide 1 Reputation point
    2024-09-19T12:51:46.8033333+00:00

    Being this issue more frequent than ever, I've made a further attempt to see what I can do with these traces. I found two MS tools that can read .etl files, "Windows Performance Analyzer" and "PerfView". In PerfView, I've tried the "Automated Trace Analysis" option (doesn't hurt to try, I thought).

    It actually detected something:

    User's image

    Jellyfin is a "server" software, so I don't see why it should be enumerating displays. It might be probing for hw encoders, but that's not done with modesetting APIs. It also might be completely unrelated.

    In a lack of better ideas, I'll try to collect another trace without the RDP variable, and then try asking the jellyfin team.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.