Jaa


PDC Day 2

Another fun day at the office today, with a host of different customer issues and questions to work on.

Just before I manage to escape and get some breakfast when i get to the track lounge, I spy Bob Davidson, the C++ Dev Manager and a good pal, staring at the debugger screen.  As a dyed-in-the-wool debugger guy, I can't resist coming up to figure out what's happening.  The customer has moved his app over to 7.0, and has noticed that when he calls one of the crt functions, in this case strcpy, his app crashes with EIP pointing to a seemingly random place.  Bob and I step through the disasm, and it looks like the import table has the wrong address in it, it seems to be another piece of code. Since there have been some major changes to do with the way applications link against the CRT, specifically the use of manifests to bind applications to the one single instance of the CRT installed on the machine, I'm immediately suspicious that there is some older version lying in the WinSxS Fusion directory.  I whip out depends from the command line(admittedly, it had moved from PlatformSDK to the Common directory, and I hadn't even noticed) and run it on the customers dll.  I see that it is complaining about ATL80.DLL not being present, but I dismiss this as bogus since the customers dll was loaded, and I executed code in it, therefor it must just be a path issue for the command line which I ran the depends tool on.  A scan over the depends list of imports shows nothing amiss.  Confused and puzzled I start to dial back on the linker smarts, e.g. linking with Edit and Continue enabled, or with incremental linking enabled.  Another run - same problem.  Now I'm concerned that there is some sort of corruption that is writing over the IAT in the customers dll.  I find the address of the particular entry in the IAT I want to dereference.  I run to a point where the customer says the dll is loaded, set my databreakpoint, hit F5, and watch it still crash without touching the databp.  I figure we are too late by this time and I ask "so where do you load the dll".  We step in and I see a piece of code similar to this:

HANDLE hDll = LoadLibraryEx(mydllpath,0,NULL);
if (!hDll)
{
   hDll = LoadLibraryEx(mydllpath,0,DONT_RESOLVE_DLL_REFERENCES);
}

And of course it turns out that the catch-all code is being used for this customers dll.  That DONT_RESOLVE_DLL_REFERENCES basically loads any code dll as an image but does not do the connection between code that this dll calls to any other dlls it needs (i.e. it never fills in the strcpy address).  So why does it fail the first time, when we are just asking it to be loaded as a normal code dll?  Well - it turns out depends was not bogus.  ATL80.DLL couldn't be found by this dll as it did not have the correct manifest.  Needless to say I felt validated as someone who can actually debug things still - I don't get as much time to dig into problems like this as I used to.  Still for 20 mins out of the 30 mins session, I was thinking "Am I going to be able to solve this"?  Maybe my "IRON Debugging" competition wouldn't be such a great idea after all...

Next up, Matt hooks me up with a customer who is struggling with badly performing code.  The code is iterating over a large collection of Double values, which is something I actually put in the the hands on lab.  The profiler points at the major issues - there are a ton of calls to get_Count and get_Item on his ArrayList.  A quick look at his code and you can see the call to get_Count in the for loop, and the implicit calls to get_Item with [] syntax on the arraylist.  Add to that the implicit unboxing and you have a scary time on your hands.  So I had two simple pieces of advice - hoist the get_Count so it is not in the loop, and change to a generic list of floats.  I come back to find his app goes from 8 seconds to 1 second on this simple advice.  Again - I feel validated :).

Next up, I headed up to Rico's talk on tips and tricks for writing performant managed code. Rico started off emphasizing the need for measuring in doing performance - paraphrase "the top 10 things I tell people are measure, measure, ... <add 7 more measures>, and measure".  Rico called out the great profiler in Team System (yay!) and also the CLRProfiler.  Alot of what he talked about is free (or nearly free) perf gains in the CLR, especially the improved ngen story, the use of generics, how exceptions, reflection and full-trust got cheaper.  Some of these, such as ngen, require some opt in on your part, but pay huge benefits in certain common scenarios.  He also mentions improvements in the base class library.  For more details, hop on over to his blog here.

Later in the track lounge I hear some interesting customer scenarios.  I'm continually blown away at these types of events about just the power and scale of stuff that our customers are doing with the tools we build.  One chap was discussing symbols server, which I'm delighted to hear customers making use of.  More interesting is the fact that they have over 1TB worth of symbols stored for their shipped executables.  Wow.  Of course, with today's hard drive prices, you could probably do 1TB for around $500.  But still - 1TB.  The customers question is about accessing the symbol server on the other side of a firewall, so I try to hook him up with the right contact at MS (you know who you are Mr. DbgHelp).

Sean and Nick, the code analysis guys, happen to talk over some stuff about analyzing likely exceptions from code, and whether we could have annotations that would help drive that analysis.  At this point late in the day, I couldn't stop myself from getting on my high horse, but it really is one of my most basic principles.  The fact that the constructs are called "exceptions" should clue people in that they should be used in exceptional circumstances.  But too many people nowadays don't think that it costs, and so use exception handling, especially in managed code, as flow control in their program.  Rico said it best to me later in the day - (and I paraphrase here) "Think of an exception happening in your program as three beeps from your console and a 1 second delay - if you can live with that - fine".  I guess I don't mind if people want to use it, but please, don't be surprised if performance becomes an issue.  Each try block is basically free, but throwing is a lot of work for the CLR, where it needs to unwind the stack and call finally methods.  It's faster than it was in 1.1, but it just ain't free.

The conference ended early as folks headed to the attendees party.  Tomorrow looks to be a long and rewarding day however, since as well as a talk on the Team System bits that I'm directly involved with, we'll have Meet the Team System Team, and Ask the Experts events until late.  Matt and I will be there again, trying to answer all your profiling, debugging an general diagnostics questions.  Come talk to us!

JoC