Transcript of Windows NT Debugging Blog Live Chat
For those of you that could not make the live chat on 8/13, here is the transcript of the chat session....
Chat Topic: PGES-Windows NT Debugging Blog Live Chat
Date: Wednesday, August 13, 2008
Daniel (Moderator):
Hello everyone-- thanks for coming to our chat on Platforms Global Escalation Services. The chat will officially get started at 1pm Eastern time. Only questions related to this topic will be addressed during this chat. Thanks!
Daniel (Moderator):
Hello everyone-- thanks for coming to our chat on Platforms Global Escalation Services. We'll get started in about 10 minutes. You can start posting your questions now if you'd like and when the chat starts our Experts will begin answering them. Be sure to check the "Ask the Experts" box before you send your questions and please keep all questions on topic-- Thanks!
Daniel (Moderator):
Let's get started with our chat. Before we begin, though, I'd like to have our Experts introduce themselves and then they'll get started answering your questions.
Smoke [Windows Core] (Expert):
Hi everyone, I'm an Escalation Engineer with the Window’s Core team. I fix bugs for a living.
Matthew [MSFT EE] (Expert):
Hello, I am an Escalation Engineer with the Platforms Global Escalation Services (Windows Core) team.
East - MSFT EE (Expert):
I am East, an Escalation Engineer with the Microsoft Platforms Global Escalation Services. (Windows Core)
Todd Webb - Msft (Expert):
I am an Escalation Engineer with the Microsoft Platforms Global Escalation Services OEM hardware team...
David (Expert):
Hi, I'm an Escalation Engineer with Windows Core - reading code & debugging is my day-to-day.
stheller (Expert):
Hi, I'm a new Escalation Engineer with Platforms GES.
Mr Ninja [MSFT EE] (Expert):
Hi, I am an Escalation Engineer with Microsoft PGES. I debug Windows for a living.
Tate [MSFT EE] (Expert):
Hi, I’m one of the EE’s on the Windows team.
Jeff Dailey MSFT EE (Expert):
Hi, my name is Jeff Dailey, I’m a Senior Escalation Engineer on the Microsoft Platforms Global Escalation Services team.
Smoke [Windows Core] (Expert):
Q: How can I track memory allocations through MmAllocateContiguousMemory?
A: You could try poolhittag on MMCM or a breakpoint on MmAllocateContiguousMemory. If you go with the break point, you can use a conditional breakpoint and dump the stack and anything else, then 'go' the system. There will be a perf hit each time you break in.
Tate [MSFT EE] (Expert):
Q: For MmAllocatecontiguousMemory, will !poolused show the total amount used?
A: !poolused 2 will show MmCm
Matthew [MSFT EE] (Expert):
Q: What's the best way to go about troubleshooting pool corruption dumps.
A: Special Pool can be used to track down pool corruption problems. https://msdn.microsoft.com/en-us/library/cc265889.aspx
a-hstein (Expert):
Greetings and sorry for the late message. I am an intern in the GES group.
Mr Ninja [MSFT EE] (Expert):
Q: Could you explain the reasons why a memory dump analysis show an "illegal instruction" exception raised from a valid instruction?
A: There are many reasons this could happen. The instruction that was executed may not be what you see due to hardware problems such as a bit flip in the instruction when it was executed. It is also possible for a hardware problem caused an exception to be raised on a valid instruction. Sometimes software, or hardware, may trigger a jump to the middle of an instruction so that the instruction being executed is not what you think it is.I described a problem where we executed from the middle of an instruction in the blog https://blogs.msdn.com/ntdebugging/archive/2008/04/28/ntdebugging-puzzler-0x00000004-this-didn-t-puzzle-the-debug-ninja-how-about-you.aspx.
Smoke [Windows Core] (Expert):
Q: We use APC's to perform certain operations one of them is to have thread cleanup and exit.Is calling thread exit from an APC recommended ? This used to work fine, but with newer service packs we have threads exiting holding the heaplock!
A: This sounds like a bad idea. I would expect different ways that this could break (just like you have observed).
David (Expert):
Q: We use APC's to perform certain operations one of them is to have thread cleanup and exit.Is calling thread exit from an APC recommended ? This used to work fine, but with newer service packs we have threads exiting holding the heaplock!
A: Part of the problem is that if ExitThread is called, any pending APCs on that thread's queue are lost.
Matthew [MSFT EE] (Expert):
Q: This question is in reference to special pool mentioned already. Is this article essentially the same as the MSDN reference? https://support.microsoft.com/kb/188831/en-us
A: The KB article documents enabling special pool via the registry, rather than verifier. These are two different ways to accomplish the same thing. Enabling it via the registry is sometimes preferred, since verifier enables additional checks beyond special pool.
East - MSFT EE (Expert):
Q: We use APC's to perform certain operations one of them is to have thread cleanup and exit.Is calling thread exit from an APC recommended ? This used to work fine, but with newer service packs we have threads exiting holding the heaplock!
A: Would this help KB254956
- If not we would need to follow-up with you for more information
East - MSFT EE (Expert):
Are there anything additional you want on the blog that we have not done?
Jeff Dailey MSFT EE (Expert):
Q: The final version of the Windows Internals Exam should be available before December 2008. I’d like to thank all the community members that participated in the Beta. Your feedback was very valuable.
East - MSFT EE (Expert):
Q: Are there anything additional you want on the blog that we have not done?
Jeff Dailey MSFT EE (Expert):
Q: When is the next Windows Internals exam scheduled? I would like to plan ahead.
A: The final version of the Windows Internals Exam should be available before December 2008. I’d like to thank all the community members that participated in the Beta. Your feedback was very valuable.
Matthew [MSFT EE] (Expert):
Q: Will we get more puzzler on the blog?
A: We’d like to do more puzzlers, but unfortunately they tend to take a lot of time, so I cannot say for sure when/if we’ll have more.
Matthew [MSFT EE] (Expert):
Q: How many of you in the audience are interested in more puzzlers on the ntdebugging blog?
Smoke [Windows Core] (Expert):
Q: Are you planning to write a book?
A: Windows Internals is a great reference book that we all rely upon. Additionally, you can check out: <https://www.amazon.com/Advanced-Debugging-Addison-Wesley-Microsoft-Technology/dp/0321374460>
Tate [MSFT EE] (Expert):
Q: As far as the blog is concerned I'm more a fan of the case studies type posts where you go through how you troubleshooted issues that you have enountered.
A: So are we!!!
Smoke [Windows Core] (Expert):
Q: I'm very interested in puzzlers...
A: Thanks for the feedback. We will try to create some more in the future.
Smoke [Windows Core] (Expert):
Q: Debugging MPI apps - sometimes a crash happens on remote and the local smpd daemon will terminate the process being debugged. Using the debugger, is there a way to guard from TerminateProcess from the child? I guess that would break some security models.
A: I'm not sure what MPI is, but this scenario sounds just like a service. The service control manager will kill the service if it doesn't respond in a timely fashion. With a service, there is a registry key to extend the timeout. If such a mechanism isn't available for you, you should consider instrumentation/logging.
East - MSFT EE (Expert):
Q: I just skimmed over KB254956, we found APC to work. The issue here is that there are alertable waits in library modules like LSA/NDR/I_RPC calls where our APC fires which raises an user exception, gets handled and exits-thread exits holding the heap lock.
A: We would need to discuss this further offline, how can I contact you?
Matthew [MSFT EE] (Expert):
Q: An award of puzzler like next edition of Windows Internals would definitely have my full attention. :)
A: We'll consider it... thanks for the feedback!
Jeff Dailey MSFT EE (Expert):
Q: Have you ever found yourselves with an "unsolvable" case? :P
A: No cases is unsolvable, nothing is truly random. Some cases may take a very long time to resolve through multiple debugging passes, detailed code review, reverse engineering and multiple iterations of instrumentation. In the end we find the problem.
Daniel (Moderator):
Just a heads-up --we have about 15 minutes left in today's chat. Be sure to post your questions asap and our Experts will try to answer as many as possible before the chat ends. Thanks.
Mr Ninja [MSFT EE] (Expert):
Q: Tri-boot machine - XP, Server 2003 and Server 2000 with 2000 being the last one installed. After awhile, I got an error: "Windows 2000 could not start because the following file is missing or corrupt: \WINDOWS\SYSTEM32\CONFIG\SYSTEMd startup options for"..
A: That is usually a known issue in Windows 2000 caused by the size of the system hive becoming too large. We have several KB articles that describe this issue KB269075, KB306038, KB323148, and KB277222 contain various resolutions you can try. I have found that most often the steps in KB277222, using scrubber in a shutdown script, resolve this problem. Starting with Windows 2003 we changed the boot architecture to prevent this problem, KB302594 describes this improvement.
Tate [MSFT EE] (Expert):
Q: Do you guys use USB debugging in Vista/2008? Why is that there is still one vendor that sells the debug dongle?
A: Serial debugging works well enough most times. Usually only if we have hardward that doesn't have a serial connection for some reason and only has USB or Firewire we try these alternates...
East - MSFT EE (Expert):
Q: I just skimmed over KB254956, we found APC to work. The issue here is that there are alertable waits in library modules like LSA/NDR/I_RPC calls where our APC fires which raises an user exception, gets handled and exits-thread exits holding the heap lock.
A: On a better note it would be best to open a case with Microsoft Support - > <https://support.microsoft.com/> -> Need more help? -> Select a Product to start
Jeff Dailey MSFT EE (Expert):
Q: What companies are in attendance today?
Graham (Expert):
Q: There are lots of post mortum debuggers available, Dr Watson, NTSD, windbg, userdump, WER. Which ones do you usually recrommend your customers to use if you need to be sure to capture a dump from a crash?
A: Userdump.exe is quite reliable for obtaining post-mortem dumps, and is easy to use. It (along with ADPlus, which uses CDB) are good because they attach to the process and monitor exceptions, and can create dumps for times when a JIT debugger would not be able to create a thread in the process to obtain the dump. Normally, I will set up drwtsn32 first, and if it cannot generate the dump, then I will go to userdump.
Smoke [Windows Core] (Expert):
Q: How can I debug cases in which just I have the Minidump for CPU Hog? I tried !runaway and does not works
A: The minidump alone may not be enough information. You could try to look at the stacks and guess at what is using the CPU, but that require familiarity with the application. You should capture a circular perfmon log with thread data. Then get 3-5 dumps of the app. From the perfmon log, you'll see what threads are active (and their activity profile). From the dumps, you'll have a few snapshot of the process in motion. Alternatively you could try a profiler like xperf.
David (Expert):
Q: Are there any free code coverage tools on Windows?
A: This article describes how to obtain code coverage data:
David (Expert):
A: https://msdn.microsoft.com/en-us/library/ms182496.aspx
stheller (Expert):
https://www.microsoft.com/whdc/devtools/tools/prefast.mspx discusses the PREfast static source code analysis tool
East - MSFT EE (Expert):
Q: Are there any free code coverage tools on Windows?
A: Please keep watching our blog site for the next chat - <https://blogs.msdn.com/ntdebugging> or you can submit the question to the our blog site
Daniel (Moderator):
Well we're out of time for today's chat. Thank you very much to all of our guests who joined us today as well as to our Experts for answering so many great questions. Have a great day!
Comments
Anonymous
August 14, 2008
PingBack from http://blog.a-foton.ru/2008/08/transcript-of-windows-nt-debugging-blog-live-chat/Anonymous
August 14, 2008
Hi I would like some hints on how to debug a situation where lsass is using 100% of cpu as we have seen this a number of timesAnonymous
August 19, 2008
Hi deckkh, For the situation you describe, I tend to suggest that for this type of problem, it helps to configure symbols[1] in Process Explorer[2]. Then, when the problem happens, sort the CPU column in Process Explorer in descending order - what is the process that is consuming the most CPU? Once you identify that process, visit its Threads tab in the Process properties, sort by Cycles Delta (descending) [Vista] or CSwitch Delta (descending) [XP/2003], and get the full stack of the topmost thread(s). With the symbols resolved, check the modules and function names for clues as to what the thread is doing... [1]=http://forum.sysinternals.com/forum_posts.asp?TID=12683&PID=57745#57745 [2]=http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx