Stress testing Visual Studio 2010
In the past several months Visual Studio and I have been really busy stress testing each other. This post is a general overview on what we've been up to and what kind of testing we're doing. I've learned a lot about stress testing and I have to say it's actually a lot of fun, so I guess it's worth sharing. I'll try to make this a series of several posts, diving into more technical details in the upcoming posts.
Background
During Beta 1 and Beta 2 it became painfully obvious that the new VS had an obesity problem: it was slow, consumed a lot of memory and the worst thing, with enough modules loaded it stopped fitting into the 2GB address space on 32-bit machines. There were several reasons for this, which Rico Mariani, Brian Harry and others have extensively blogged about. In a nutshell, with a lot of new functionality a lot more modules were loaded into memory. Besides, we now had to fully load the CLR and WPF at application startup. Moreover, there were all kinds of memory leaks all over the place.
Making performance a top priority
Of course this wasn't good, so our management made the right decision to make performance our top priority. Jason really took it seriously and we dedicated a lot of people to work fulltime to make Visual Studio fast and lean. As part of this effort I became a member of a virtual team called "Perf SWAT". This team is responsible for essentially three things: performance, memory consumption and design-time stress.
Performance is clear: we need to be fast. Memory consumption is clear too: when we load, we need to take as little memory as possible, and avoid things such as double-loaded modules, loading both NGEN and IL versions of an assembly, and so on.
Design-time stress on the VSL team
As for design-time stress, the goal is once we're loaded into memory, jitted, warmed up and all the caches are filled, we should not continue to grow in consumption. This means find and eliminate all memory and resource leaks. Run-time stress means finding leaks in the CLR and BCL, design-time stress means finding leaks in the VS and tooling. I am responsible for design-time stress testing for the VSL team (managed languages). I need to make sure that there are no significant leaks in 4 areas:
- C# IDE and editor integration (C# code editor, navigation, refactorings and other core C# areas)
- VB IDE and editor integration
- F# IDE
- Hostable Editor (Workflow Designer in VS 2010 is essentially hosting a full-blown language service to show IntelliSense in the expression editor on the workflow diagram)
Progress
The good news is that we've made tremendous progress since Beta 2 and have brought the product into a much better state: it is much faster, more responsive, takes up much less memory and we also hope to have eliminated all major known memory leaks. A common complaint was that VS was growing in memory during usage and you had to restart it after a certain time. Right now we hope that you can mostly keep Visual Studio open for days (even weeks) without having to restart it.
8 hour stress tests
The official sign-off criteria is that the end user needs to be able to keep VS open for an entire work week without any noticeable performance degradation (this means 5 days times 8 hours a day). We've calculated that in average continuous human usage of 40 hours is equivalent to running our tests for 8 hours (tests are doing things faster than a human).
We have identified and implemented a series of 22 tests for all the 4 language teams mentioned above. Each test covers one continuous kind of activity, e.g. CSharpStressEditing, CSharpStressNavigation, CSharpStressIntelliSense, CSharpStressDebugging, CSharpStressUI, VBStressEditing, VBStressProjectSystem, FSharpStressEditing, and so on.
Each test runs for 8 hours on a machine in the lab and VS memory usage details are automatically logged. We've also developed tools to automatically analyze the stress logs and produce Excel spreadsheets and charts for analysis and reporting.
Several months ago a typical test would start at about 300 MB ProcessWorkingSet and crash after several hours with OOM (Out-Of-Memory exception). None of the tests would even be able to run for 8 hours. After finding and fixing a lot (a lot!) of bugs, we were able to get it running for 8 hours – VS memory usage grew from about 300-400 MB of WorkingSet to over 1 GB over the period of 8 hours (that was anywhere from 200-500 stress iterations).
Right now a typical test starts at about 150-200 MB and finishes 8 hours later at 200-300 MB. Also, instead of 500 iterations, it is able to do 3000-5000 iterations during 8 hours on the same hardware. Which means we made it considerably faster and also reduced the leaks in major feature areas to a minimum (right now a feature is considered not leaking if there is average increase of less then ~5KB per iteration).
I'll try to continue blogging about our stress testing and dive more into the technical details: what we measure, how we measure, how we find bugs and how we'll know when we're eventually done.
Comments
Anonymous
February 07, 2010
This sounds great! As a developer, if VS2010 slows me down, that's a solid reason not to upgrade.Anonymous
February 07, 2010
great article! keep em coming.Anonymous
February 07, 2010
Didn't understand the following statement pretty well: "right now a feature is considered not leaking if there is average increase of less than ~5KB per iteration". If there is no memory leak, why is there still this ~5Kb memory consumption increase on each interaction?Anonymous
February 07, 2010
I think "leaking" here might mean just growth of consumed memory. not actual leaking..Anonymous
February 07, 2010
Very interesting. I'd very much like to know more about the tools you use for stress testing and analysis. Do you use any profiling tools to find out performance bottlenecks? Do you use any specific tools to detect memory that wasn't freed?Anonymous
February 08, 2010
More interesting question: how does VS2010 pass your own SA tools? :) Would be interesting to see any PREfast results...Anonymous
February 08, 2010
I hope you are also running the tests in combination with each other, as well as in isolation. I am likely to perform every one of the activities you mentioned in a single day. As Brian Harry has been blogging about, over-emphasizing repeatable tests at the expense of real-world scenarios is apparently what led to a lot of the performance problems in the first place.Anonymous
February 08, 2010
Thanks everyone for your questions! I'll answer them at length in the upcoming posts. Luciano: suppose the test is editing a file. Undo history keeps track of all changes to that file. The Undo buffer is growing with every edit, but you can't consider it a memory leak. Hope this clarifies.Anonymous
February 08, 2010
I really appreciate the transparency that the Visual Studio and .NET teams have been showing over the last couple of years. Don't let the marketing folks shut you up! Real honesty and transparency, even when it's not flattering, makes for much better long-term marketing. Everybody else knows about the problems: it can only enhance Microsoft's credibility when they talk about them as well. (And conversely, failure to talk openly about them makes it seem like folks at MS are out of touch -- like the MS manager I heard about recently who supposedly told everyone on his team that they needed to get rid of their iPhones or they'd be fired.)Anonymous
February 08, 2010
Wonderful!!! Please remember that even small changes in performance makes a huge difference in the lives of countless developers. Beta 2 was really slow all around. And using the WPF designer made the memory footprint go through the roof. Can't wait to get the final bits.Anonymous
February 08, 2010
Thanks for the great feedback. I sincerely hope it is better in terms of performance than VS 2008 as that is sloath in terms of developing programs which use the data designer with lots of tables / stored procedures.Anonymous
February 08, 2010
Hi, which version of VS do you guys use to develop the VS2010 studio? Would be interesting to know if you use VS2010 to debug VS2010! Yes, trivial, but then I can say to whiners that MS' products, even in beta/RC can handle its weight on a serious product like VS.Anonymous
February 08, 2010
I really appreciate your honesty. My thrust is bigger now!Anonymous
February 08, 2010
The comment has been removedAnonymous
February 09, 2010
From comparing of same project build in various environment including Atom 330 and Q8400 both with plenty of RAM and relatively good average seek time HDD in XP I could say that 2010 is mostly depended on CPU performance. Tuning over clocked CPU frequency on Atom shows linear improve in total build time. Is there any unified benchmark project to measure VS performance?Anonymous
February 11, 2010
Just mouse wheeling up and down causes the memory to gradually climb. It seems to be sensitive to large linq2sql code nests. It is not a happy camper when running along side vs2008 on my old xp 4ht work machine and has managed to totally freeze on several occassions. I find regularly saving all helps a bit. Alos notice that it caches for reload is I close and re-opn a project. Could this be optionally turned off because it comes back just as dead as it was when closed. Keep up the good work!Anonymous
February 13, 2010
@Kirill Osenkov > Thanks everyone for your questions! I'll answer them at length in the upcoming posts. Great! I can't wait for your at-length answers then. You did it just like Rico Mariani and Brian Harry did IHMO, and I really like it. Thank you!Anonymous
February 14, 2010
That is a great news! Concerning stress tests, I understand the need to make then run in loop for hours. But it is also essential to measure the percentage of code covered by tests.